Adjusting for Population Differences in the National Cancer Database to Better Represent United States Cancer Cases: A Reference Tool for Researchers
database[Title] 2025-04-20
Ann Surg Oncol. 2025 Apr 18. doi: 10.1245/s10434-025-17285-x. Online ahead of print.
ABSTRACT
BACKGROUND: The National Cancer Database (NCDB) is widely used in US cancer outcomes research, but its reliance on Commission on Cancer-approved hospitals can underrepresent certain populations, skew data, and limit generalizability of findings. Current literature is representative up through 2014. We sought to adjust NCDB cancer cases to better reflect total US cancer population in a useful way for cancer outcomes research.
METHODS: Incident cancer cases in the NCDB from 2016-2020 were compared with the US Cancer Statistics (USCS) database, which contains nearly 100% of new cancer cases. NCDB case coverage was defined as percentage of cases the NCDB represents of USCS cases. Coverage was determined for the entire cohort (age 20+ years), and sub-analyses were performed for age, sex, race/ethnicity, residence location, and cancer sites.
RESULTS: From 2016-2020, 6,515,675 cancer cases were diagnosed in the NCDB and 9,311,593 in the USCS, yielding 70% NCDB case coverage over 5 years, which increased from 68 to 73%. The lowest case coverage was among men, 85+-year-olds, American Indian/Alaskan Native people, and Hispanic/Latino individuals (65%, 59%, 42%, and 55%). The Mountain region was the least represented (49%) as was nonmetropolitan residence (64%). Similar underrepresentation was seen among top cancers. Missingness of data was also captured.
CONCLUSIONS: Though NCDB's representation of US cancer cases is improving, gaps remain, including age, sex, race/ethnicity, and residence location, further exacerbated by missing variables. We provide investigators using the NCDB with a way to represent cancer case data to better tailor research questions and frame outcomes.
PMID:40251365 | DOI:10.1245/s10434-025-17285-x