Development and Validation of an Algorithm for Constructing an Amino Acid Database for Application to the Korean Genome and Epidemiology Study Cohort

database[Title] 2026-04-15

Nutrients. 2026 Apr 2;18(7):1147. doi: 10.3390/nu18071147.

ABSTRACT

BACKGROUND/OBJECTIVES: The Korean Genome and Epidemiology Study (KoGES) is a large population-based cohort designed to investigate chronic disease risk using long-term dietary and health data. However, comprehensive amino acid information for estimating long-term intake from food frequency questionnaire (FFQ) data remains limited. This study aimed to develop and validate a standardized, rule-based algorithm for food matching and substitution and to construct an amino acid database applicable to the KoGES FFQ.

METHODS: The algorithm sequentially evaluated food name concordance, preparation forms, substitutability of similar foods, and differences in energy, macronutrients, and moisture (±20%). Amino acid composition data were derived from domestic and international food composition tables and published literature, with protein-nitrogen conversion factors applied by food group.

RESULTS: Amino acid information was established for 475 FFQ food items covering 19 amino acids. Of the database values, 31.0% were analytical, 64.2% were calculated, and 4.8% were substituted. Overall database coverage across all amino acid-food item combinations was 98.8%. The constructed database was applied to dietary data from the second follow-up (Phase 3) of the KoGES Ansan and Ansung community-based cohorts, showing that total amino acid intake accounted for 86.7% of total protein intake, reflecting the inclusion of non-protein nitrogen in conventional protein estimates. Based on the Estimated Average Requirement (EAR) criteria, the proportions of participants with intakes below the EAR for protein and essential amino acids varied across age and sex groups. Overall and in both men and women, lysine showed the highest proportion of participants below the EAR, whereas tryptophan showed the lowest proportion.

CONCLUSIONS: This standardized algorithm provides a reproducible framework for constructing amino acid databases and can be applied to large-scale cohort and dietary survey data.

PMID:41978197 | PMC:PMC13074795 | DOI:10.3390/nu18071147