Enhancing Diabetes Mellitus Onset Prediction through Advanced Ensemble Learning Techniques
Main Article Content
Abstract
Type 2 diabetes is a major worldwide health issue, necessitating accurate and effective prediction models for timely intervention. Traditional machine learning (ML) models often underperform with imbalanced datasets and complex data relationships, resulting in suboptimal predictive accuracy. This study applies advanced ensemble methods, such as random forest, boosting, bagging, and stacking, to enhance diabetes onset prediction using a synthetic minority over-sampling technique (SMOTE)-balanced data from the Pima Indians Diabetes Database. The research involves extensive data processing, feature engineering, and cross-validation. Model performance is assessed using several evaluation metrics, such as F1-score and AUC-ROC (Area Under the Curve-Receiver Operating Characteristic), along with accuracy, precision, and recall. The findings indicate that ensemble techniques, especially random forest, bagging, and boosting, surpass traditional models, achieving an accuracy of 88%, recall of 82%, and precision of 85%. These findings emphasize the effectiveness of ensemble learning in enhancing predictive analytics for healthcare, supporting early diagnosis, and personalized patient care. Future research should explore integrating deep learning models with diverse datasets to improve predictive accuracy and generalizability.