UPDF AI

DOFCM-SMOTE : A technique to handle class imbalance problem

Shweta Sharma,A. Gosain

2025 · DOI: 10.47974/jios-1946
Journal of Information and Optimization Sciences · 引用数 0

TLDR

The findings of this study are centric on the process of clustering through which outliers are detected even before minority and majority class clusters are identified, and a general pre-processing technique, to balance the distribution of classes.

摘要

Class imbalance is a common issue in real-world datasets, where there are relatively few instances of the class of interest compared to others. The classifier’s performance is highly affected by the impurities built-in with the data like imbalanced data, noise, and class overlapping; therefore, an oversampling technique is proposed by fusing density-oriented fuzzy-c means clustering and synthetic minority oversampling technique(DOFCM-SMOTE). The density-oriented approach is intended to find outliers to address the performance issue. The proposed method works in 3 phases: 1) It detects and eliminates outliers in the dataset, 2) It Assigns membership degrees to the given samples by fuzzy c-means clustering, and 3) It then applies SMOTE to the minority cluster to balance the distribution of classes. To determine the performance, a decision tree classifier is employed as a learner. This study was conducted on ten publicly available data sets with varying imbalance ratios(IR) ranging from high to low. The findings of this study are centric on the process of clustering through which outliers are detected even before minority and majority class clusters are identified, and a general pre-processing technique, to balance the distribution of classes. The introduced approach is compared with four other oversampling techniques and the best balancing algorithm is determined. The performances are accessed through AUC and G-mean, and it was reported that the proposed technique significantly improved and outperformed the other over-sampling methods.

参考文献
引用文献