Millions of people are missing from CDC COVID data as states fail to report cases
Go Deeper.
Create an account or log in to save stories.
Like this?
Thanks for liking this story! We have added it to a list of your favorite stories.
Colorful maps on the new online Health Equity Tracker reveal how the COVID-19 pandemic has affected different races and age groups across the United States, but you can tell something is not quite right.
A handful of states are grayed out, and that's not because they've escaped the pandemic.
"There's no data coming out of Texas," points out software engineer Josh Zarrabi of Atlanta's Morehouse School of Medicine, which recently rolled out the tracking portal. "A lot of Americans should be unhappy about that. And they should say, like, 'Wow, like, we need the data, right, because we're missing a huge piece of the puzzle here.' "
And it's not just a search for jigsaw pieces from Texas.
Turn Up Your Support
MPR News helps you turn down the noise and build shared understanding. Turn up your support for this public resource and keep trusted journalism accessible to all.
The Centers for Disease Control and Prevention has tallied over 39 million cases of COVID-19 in the U.S., but if you want to know more detailed information, such as where patients live, whether patients were hospitalized or died, demographic details like race, gender and age, that information is gathered separately.
In that more detailed CDC data set, about 1 in 5 known cases — or 7 million people — are completely missing, an NPR analysis found. On top of that, about two-thirds of the data present aren't usable, as health care providers marked fields as "Unknown" or simply left them blank.
Most states have voluntarily sent up whatever records they have, but a handful have not. Texas, Missouri, Louisiana, West Virginia and Wyoming have each submitted less than one-tenth of their total cases. Another handful of states, including Florida, Michigan and Kentucky, have smaller but still significant gaps in their data, each completely missing over 30 percent of their known cases.
Over 3 million Texans have had COVID-19, but just 81,000 are in the data set. That's not even 3 percent.
"That is ludicrous. It is shameful. It is wrong," says Nancy Krieger, a social epidemiologist at Harvard University. "You need good data to do proper planning, to understand what the risk is, how the risk is changing. And you need that to be real data that are publicly available and accessible."
The standardized details that states feed into this data set help the CDC track COVID-19's spread, evaluate demographic trends and develop health guidance for at-risk groups and the country as a whole. However, the completeness of these records varies widely.
Just 1 percent of the records present are missing the patient's age or sex, while 36 percent of them have race and/or ethnicity marked as "Unknown" or left blank. The CDC asked states to indicate whether patients experienced any of 15 symptoms, including fever, chills or muscle aches, but states largely have not done so. Health departments may update these details as they conclude case investigations, but for more than 90 percent of patients, most of these fields are currently blank or marked "Unknown."
At the CDC, Paula Yoon directs the surveillance of about 120 infectious diseases, including COVID-19. Her epidemiologists use field studies and lab reports to fill in the blanks the best they can. However, Yoon agrees their job would be simpler if they just had all that data.
"Yes, we would be in a much better place," Yoon says. "It's not because the states are not sharing those data with us. It's because the states don't have those data themselves."
Why states aren't submitting data
There are plenty of reasons why some states haven't submitted their data. Chief among them is the fact that public health has been underfunded for years. This has created a county-by-county patchwork of outdated disease-tracking systems linking hospitals and labs to public health departments.
Many counties have their own local tracking systems that can't automatically transfer records to the state and then up to the CDC.
At the Jackson County Health Department near Kansas City, Mo., communicable disease division manager Chip Cohlmia jokes that public health is keeping fax machines alive. In his county and many others, hospitals fax records to county health departments, and workers manually enter data.
"It's like having an old car, and you're needing to push the car to, like, 100 miles an hour," Cohlmia says. "But, you know, you haven't changed the oil. You haven't checked the tires. The check-engine light's been on."
Across the state in Columbia, Public Health planning supervisor Rebecca Roesslet says they still need to manually transfer 12,000 records — over half of the county's COVID-19 cases so far. It's a painstaking process of copy-and-pasting data points, field by field.
"That's not our priority. Right now, our priority is in contacting people who have tested positive for COVID and providing them with the education they need and identifying their close contacts," Roesslet says.
In Texas, the state Health Department launched a new COVID-19 tracking system in May 2020, but most large counties had already developed their own systems by then. This only added more patches onto public health's quilt of tracking systems that don't automatically communicate with each other.
Austin Public Health's chief epidemiologist, Janet Pichette, says they wanted to collect more data than the state was gathering. They also didn't want to rely on an outside system that might go down unexpectedly.
"Once you are a person who works in data and epidemiology, you become very territorial. You want to have control of your own data, right?" Pichette says.
Just like the CDC can't mandate what state health departments must do, states can't always tell counties or cities what to do.
"I wouldn't touch that with a 10-foot pole," says Diana Cervantes. She currently teaches epidemiology at the University of North Texas, but she supervised a 49-county region with the state Health Department until 2018.
"We basically prefer this hands-off approach. We don't want to start getting in power struggles with the locals," Cervantes says. "The very last thing they're going to worry about is the state because they're not accountable to them."
Local health officials report to their county leaders — people like Ellis County Judge Todd Little near Dallas. He had a full-time staffer working to reconcile county data with what he called "unreliable" state data, but they gave up in June.
"We've done a successful job in mitigating a suburban county's spread. And at this point, we're ready to move on with our lives and experience the freedom that all Texans get to experience on a daily basis. We're ready to move on," Little says.
Can the data holes be patched?
Even in an imaginary world with all the right technology and plenty of workers, filling in missing data isn't always easy. Patients might not fit neatly into the CDC's standard race or sex boxes, or they might not want to reveal personal details.
"It can be loud, angry, violent screaming and those kinds of things," Cohlmia says, describing his case investigation calls with COVID-19 patients in the Kansas City area. "There have been death threats on our office. There have been protests outside."
Several states told NPR the pandemic outstripped their technical capabilities and overwhelmed their limited staff, but some promise that fixes are on the horizon.
Spokesperson Lisa Cox from the Missouri Department of Health and Senior Services says the department is looking for a way to remove duplicate records before it sends data to the CDC. She expects that by late September.
A Texas Health Department spokesperson says the department expects to transfer case data from various counties into the CDC's system by October. She blames Texas' decentralized public health system for the delay.
No state has perfect COVID-19 data. But most, including two dozen other states with decentralized public health systems, have figured out how to send what they do have to the CDC. This allows researchers, nonprofit organizations or any interested citizen to check the data and make fair comparisons across state lines of who's most affected by COVID-19.
"We should have these data at this point," says Krieger, the Harvard epidemiologist. "The answer to having not-good-enough data is to make it really public that it's not good enough and to figure out, how do you make it better?"
Last fall, California's public health director resigned because a data glitch with the state's tracking system meant thousands of records were missing. California paid a tech firm $15 million for a new system that could keep up. Within a few months, California's COVID-19 cases were flowing back into the CDC data set.
This summer, the CDC awarded $200 million in COVID-19 relief funds to states to modernize data systems. Yoon, the CDC's surveillance director, says that this money has helped thousands of hospitals add electronic reporting, but there's a long way to go. It'll take sustained funding, a skilled workforce and cooperation from states and counties to keep going.
"We can't go it alone," Yoon says. "It's not a one-and-done situation where you modernize it and then you're done and you can walk away."
But Texas alone now has over 3 million cases to submit, and Missouri, Louisiana, Mississippi and a handful of other states each have hundreds of thousands to work through.
As the delta variant continues its surge, their to-do lists will just keep growing.
Methodology
The CDC's COVID-19 case surveillance data is updated twice monthly and includes all cases reported, with a 14-day lag. To calculate the approximate number of COVID-19 patients missing from each state in each update since June 2020, NPR compared the number of records present in each update with that state's cumulative case count from 14 days earlier, to account for the reporting lag.
The data set contains 29 variables from standardized COVID-19 case report forms. To calculate the number of unusable fields, NPR found how many records were marked "Missing" or "Unknown" for each variable.
The data used in this report comes from the most recent update, on Aug. 17.
Copyright 2021 NPR. To see more, visit https://www.npr.org.