Building and validating a machine learning-based survival prediction model for early gastric cancer: a SEER database analysis

database[Title] 2026-04-20

Surg Oncol. 2026 Mar 31;66:102416. doi: 10.1016/j.suronc.2026.102416. Online ahead of print.

ABSTRACT

BACKGROUND: Early gastric cancer (EGC) represents a substantial disease burden. In order to manage it, this study aimed to construct a time-interval survival classification models for patients with EGC based on machine learning (ML) algorithms.

METHODS: This retrospective study analyzed data from the Surveillance, Epidemiology, and End Results (SEER) database covering the period from 2000 to 2021. Univariate logistic regression and least absolute shrinkage and selection operator (LASSO) were used to identify potential risk factors influencing the survival of patients with EGC. The performance of nine ML models was evaluated using receiver operating characteristic (ROC) curves, the area under the curve (AUC), and decision curve analysis (DCA). Finally, the impact of the variables on 1-3 years, 3-5 years, and 5-10 years survival outcomes was assessed by the Shapley additive explanations (SHAP).

RESULTS: A total of 4088 patients with EGC were included. For 1-3 years of survival prediction, logistic regression (LR) achieved the highest AUC value of 0.8142. For 3-5 years of survival, KNN exhibited the highest AUC of 0.6362. Multilayer Perceptron (MLP) displayed the highest AUC of 0.692 in the 5-10 years survival group. The outcomes of SHAP indicated that age was the most common predictor in all groups. The primary site of the tumor, tumor size, and histologic type significantly affect the survival rates in the 1-3 years, 3-5 years, and 5-10 years groups, respectively.

CONCLUSION: This study develops and validates an ML-based survival prediction model for EGC. These findings demonstrate the methodological potential of ML for understanding prognostic factors in cancer patient management.

PMID:41980499 | DOI:10.1016/j.suronc.2026.102416