Handling Multiclass Imbalance in Diabetes, Cancer, and Pneumonia Classification Using NR-Clustering SMOTE

Authors

DOI:

https://doi.org/10.71129/ijaci.v2i2.pp83-95

Keywords:

Multiclass classification , SMOTE, NR-Clustering, Medical diagnosis, Imbalanced data

Abstract

The problem of imbalanced data in multiclass health classification often results in biased model predictions, particularly underrepresenting critical disease classes such as cancer and pneumonia. Traditional oversampling techniques like SMOTE often suffer from issues such as noise generation and class overlap, limiting their effectiveness in such complex domains.This research aims to address the challenge of multiclass imbalance in the classification of diabetes, cancer, and pneumonia by proposing an improved oversampling technique, NR-Clustering SMOTE, which integrates K-Means clustering and Euclidean distance.The proposed method starts by filtering noisy data using k-NN, clusters the minority class data with K-Means (optimized via Silhouette Score), and applies SMOTE within each cluster using Euclidean distance. This ensures localized sample generation, minimizes noise, and reduces class overlapping. The balanced dataset is then evaluated using ten machine learning algorithms, including Extra Trees, Random Forest, and Stacked Ensemble.Experimental results show significant improvements in classification metrics, especially for minority classes. For instance, after oversampling, Extra Trees achieved 89% accuracy and an AUC of 0.97—compared to only 48% and 0.50 on the original dataset.This demonstrates that NR-Clustering SMOTE effectively improves classifier sensitivity toward minority classes without compromising the majority class performance. The improvement is consistent across various models, proving the robustness of the proposed method. In conclusion, NR-Clustering SMOTE with Euclidean distance combined with ensemble classifiers like Extra Trees is a promising solution for handling multiclass imbalanced health data, particularly in domains requiring accurate detection of minority diseases.

Downloads

Published

2025-10-20

Abstract

    31 views

PDF Download

    17 times

Similar Articles

1-10 of 16

You may also start an advanced similarity search for this article.