Abstract
Thyroid cancer, the most common endocrine malignancy, usually has an excellent prognosis but recurrence is a major clinical issue, occurring in about 20–30% of patients. Traditional risk stratification models like the TNM staging system and the American Thyroid Association (ATA) guidelines are not complete in addressing the intricate nonlinear interaction among demographic, clinical, histopathological, and molecular factors affecting recurrence. The current research investigates the application of machine learning (ML) models in predicting thyroid cancer recurrence using retrospective data collected from three Nigerian tertiary centres. There were 510 patients with differentiated thyroid cancer (DTC) included in the study, whom the researchers treated between 2012 and 2022. Characteristics under consideration for examination were age, sex, size of the tumour, histological type, lymph node status, BRAF mutation, serum thyroglobulin (Tg) value, scope of surgery, radioactive iodine (RAI) therapy, and follow-up results. Preprocessing consisted of normalization, imputation, encoding, and balancing with SMOTE. Four supervised ML algorithms Logistic Regression, Support Vector Machine (SVM), Random Forest, and XGBoost were trained and validated on 5-fold cross-validation with an 80:20 train-test ratio. Model performance on accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC) was checked. XGBoost model was seen to perform the best with 90% accuracy, 81% recall, 82% F1-score, and AUC of 0.89. Random Forest and SVM models were performing well, while Logistic Regression was seen to have lower predictive power. The feature importance of the XGBoost model identified thyroglobulin levels, lymph node status, tumour size, BRAF mutation status, and age as the strongest predictors of recurrence. This indicates that machine learning algorithms, specifically ensemble models, can result in marked improvements over standard statistical techniques in the discovery of high-risk patients. This research establishes the first foundation for predictive tool development in the Nigerian health environment, promising to support clinical decision-making, tailor follow-up regimens, and enhance long-term thyroid oncology outcomes. External validation, inclusion of further genomic markers, and integration into real-time clinical decision-support systems are areas for exploration in future research.

This work is licensed under a Creative Commons Attribution 4.0 International License.