May 14 2021
Data Analytics

Data Tools Make the Tally Count for the U.S. Census Bureau

Analytics powered by Splunk kept vast pools of information organized and moving.

As hundreds of thousands of census takers prepared to canvass the nation’s main streets, dirt roads, college campuses and skyscrapers last year, COVID-19 stopped them in their tracks.

But the pandemic shutdowns were just the first of many obstacles threatening the once-per-decade count of the U.S. population; among them, wildfires, hurricanes, civil unrest, tensions over the election and cyberattacks.

This time, the complex and detailed project was assisted by data analysis tools. All of the information about the countless factors affecting the operation were funneled through a U.S. Census Bureau fusion center powered by Splunk’s Data-to-Everything platform.

“This is my third census, and I’ve never been through a situation where we had so many reasons that threatened our ability might not finish,” says Michael Thieme, assistant director for decennial census programs, systems and contracts at the U.S. Census Bureau. “But in the end, we did finish.”

He credits the success to automation. The 2020 census was the first fully digital decennial count, allowing U.S. residents to submit responses to the census via a cloud-based portal. More than 53.5 percent of respondents took advantage of the new capability.

Data Steered the Census Bureau Through the Count

In addition, enumerators followed up with residents who did not respond to the initial survey by visiting them at their homes, using iPhones (configured by CDW•G as a Device as a Service solution to collect data and communicate with census supervisors.

The Census Bureau’s fusion center pulled data feeds from a number of locations: from the U.S. Centers for Disease Control and Prevention, about COVID-19 outbreaks; from state and local agencies, to monitor weather and natural disasters; and from the enumerators and the internet self-responses, to manage census operations.

Census directors could see in real time which geographic regions’ response rates were lagging and devote more resources to those areas, and they could use data from the security operations center to respond to cyberthreats in real time.

“That helped us make real-time, data-driven decisions,” says Thieme. “All of this data was coming in every day so that we could do an analysis and make sure that what we were asking people to do made sense.”

The time was ripe for an operation such as this, says Laura DiDio, principal analyst for Information Technology Intelligence Consulting. Data analytics helped agencies across government respond to the COVID-19 crisis. It offers “everything from the 30,000-foot view to these tidbits of data that give you access to actionable insights,” DiDio says.

“Knowledge is power,” she adds. “That’s the real message of what has happened under COVID.”

READ MORE: Learn how the Census Bureau handled its delicate cybersecurity issues.

Tech Helped the Census Bureau Make Up for Lost Time

In the 2020 census, the bureau had separate systems to gather data and manage operations. Digitizing the census helped pare down that list of technologies, but the bureau’s infrastructure still consisted of 52 systems — everything from operating systems and databases to middleware and devices.

145 million

The number of U.S. addresses in the Census Bureau’s 2020 database

Source: U.S. Census Bureau

The bureau used the data to help monitor field operations, including distribution of devices, employee training and payroll execution. “We have a week-by-week budget that we track as operations continue, so we can make adjustments if it looks like we’re somehow off track,” Thieme says.

The bureau was even able to use the data to reduce workloads throughout the operation. In the past, even if a citizen returned a census questionnaire by mail, an enumerator might still show up at his door because the paper form hadn’t yet been processed. In 2020, address lists were updated within minutes as responses came in via phone, internet or from the field.

Splunk enabled single-platform monitoring of those varied systems to provide the bureau with a holistic view of its operations, explains Thieme. “Splunk’s data aggregation tool let us build on-the-fly dashboards for almost anything we thought we needed,” he says.

Another major timesaver was in-office address canvassing. The census counts every person living at a particular address, so it’s crucial to ensure the accuracy of the bureau’s address database and to make sure that address still exists. In the past, canvassers did that by going door to door in advance of the actual count.

“In-office address canvassing actually allowed us to do 65 percent of the country without leaving our offices,” says Thieme.

READ MORE: Configuration specialists can make your agency’s project more manageable.

Data Feeds Provided a Path for Census Planning

Such efficiencies made up for the two and a half months that the bureau lost in field canvassing due to chaotic conditions across the country. Enumerators were supposed to visit nonresponding households between May 13 and July 31, but the national in-person count was delayed until July 16 because of the pandemic.

Locally, some enumerators were pulled off the streets in neighborhoods hit so hard by wildfires that the smoky air was unbreathable; others had to hunt down residents displaced by major hurricanes. As a result, the deadline for finishing was extended to Oct. 15.

Michael Thieme,  Assistant Director for Decennial Census Programs, Systems and Contracts, U.S. Census Bureau
All of this data was coming in every day so that we could do an analysis and make sure that what we were asking people to do made sense.”

Michael Thieme Assistant Director for Decennial Census Programs, Systems and Contracts, U.S. Census Bureau

“Through that whole process, we had to pause field data collection and restart it many times,” says Thieme. “We had to replan and take a phased approach.”

Each morning, bureau executives and fusion center staff would meet to review the feed from the fusion center on everything from COVID-19 infection rates to protest locations and the air quality in areas near wildfires.

Based on that feed, they would use a stoplight system to determine which of their 248 offices to open or close. If an office was on red, it would be closed for the day. Green meant it was safe to resume operations.

Extending the operation introduced new challenges, Thieme adds. Although the bureau is already required to keep individual responses confidential for 72 years, it was even more cautious than usual about security and privacy because of the digital nature of the 2020 census.

“The more days you’re open, the more days you’re vulnerable, and adding another three months to our data collection meant that we were on high alert much longer than we intended to be,” says Thieme.

DISCOVER: What is the Census Bureau’s Device as a Service origin story?

Could Data Processing Move as Quickly as Data Collecting?

The extension was also complicated from a manpower perspective. The bureau had signed contracts and booked contractors through a specified time, and they had to extend many to ensure they got through with full coverage, he says.

“I think 2020 changed all of us, but it also broadened the scope about what we should plan for,” Thieme says. “Nobody planned for this.”

As he looks toward the next census, he would like to revisit the way the front- and back-end systems interact. He wonders if rethinking the way data is collected, the way the bureau builds its databases and the way it connects data sources could make the data-processing stage run as efficiently as data collection.

“Instead of the census taking three, four or five months to process, why can’t we be done 10, 20 or 30 days after we’ve done data collection?” he suggests. “It has all the room, I think, for innovation.”

blackred/Getty Images
Close

Become an Insider

Unlock white papers, personalized recommendations and other premium content for an in-depth look at evolving IT