Advanced Weighted Approach for Class Imbalance Learning
DOI:
https://doi.org/10.56294/dm2025719Keywords:
Representativeness, Sample weights, Calibration, Class imbalance, Support vector machineAbstract
Predictive models derived from statistical learning techniques often assume that data originate from simple random sampling, thus assigning equal weight to all individuals. However, this assumption faces two significant challenges: it overlooks the complexity of real samples, where individuals may have different sampling weights, and it introduces a bias toward the majority class in imbalanced datasets. In this study, we propose an innovative approach that introduces differentiated weights for individuals by adjusting sample weights through calibration. This method aims to address class imbalance issues while improving the representativeness of samples. We applied it to the Support Vector Machine. Additionally, we developed an improved adjusted weighting approach to further enhance model performance, particularly for the minority class. This improved version combines two widely used techniques for handling class imbalances (resampling and cost-sensitive learning) by first balancing the classes through resampling, then applying adjusted sample weights during training. We evaluated the performance of our approach on real datasets with varying levels of imbalance using multiple evaluation metrics. The results were compared with various conventional methods commonly employed to address class imbalance. Our findings demonstrate the relevance and generalizability of our proposed algorithms, which often achieve performance equal to or better than that of established competing methods. Overall, our methodology not only corrects sample imbalances but also ensures a more accurate representation of the target population in the model, making it a robust and flexible solution for real-world imbalanced classification challenges.
References
1. Ramalingam PA, Fathima N, Supriya P, Shetty P, Sanyal M, Yeshaswini P, et al. Data Complexity for Identifying Suitable Algorithms. In: 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC). IEEE; 2023. page 955‑61.
2. Thomas T, Rajabi E. A systematic review of machine learning-based missing value imputation techniques. Data Technologies and Applications 2021;55(4):558‑85.
3. Gupta S, Gupta A. Dealing with noise problem in machine learning data-sets: A systematic review. Procedia Computer Science 2019;161:466‑74.
4. Suresh H, Guttag JV. A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:190110002 2019;2(8):73.
5. Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, Chawla NV, Herrera F. A unifying view on dataset shift in classification. Pattern Recognition 2012;45(1):521‑30.
6. Abd Elrahman SM, Abraham A. A review of class imbalance problem. Journal of Network and Innovative Computing 2013;1:9‑9.
7. Ali A, Shamsuddin SM, Ralescu AL. Classification with class imbalance problem. Int J Advance Soft Compu Appl 2013;5(3):176‑204.
8. Rezvani S, Wang X. A broad review on class imbalance learning techniques. Applied Soft Computing 2023;143:110415.
9. Batuwita R, Palade V. Class imbalance learning methods for support vector machines. Imbalanced learning: Foundations, algorithms, and applications 2013;83‑99.
10. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20(3):273‑97.
11. Wang L, Gao C, Zhao N, Chen X. A projection wavelet weighted twin support vector regression and its primal solution. Appl Intell 2019;49(8):3061‑81.
12. Hazarika BB, Gupta D, Ashu, Berlin M. A Comparative Analysis of Artificial Neural Network and Support Vector Regression for River Suspended Sediment Load Prediction. In: Luhach A, Kosa J, Poonia R, Gao XZ, Singh D, éditeurs. First International Conference on Sustainable Technologies for Computational Intelligence. Singapore: Springer; 2020.
13. Borah P, Gupta D. Functional iterative approaches for solving support vector classification problems based on generalized Huber loss. Neural Comput & Applic 2020;32(13):9245‑65.
14. Fletcher R. Practical methods of optimization. John Wiley & Sons; 2000.
15. Smola AJ, Schölkopf B, Müller KR. The connection between regularization operators and support vector kernels. Neural Networks 1998;11(4):637‑49.
16. Haibo He, Garcia EA. Learning from Imbalanced Data. IEEE Trans Knowl Data Eng 2009;21(9):1263‑84.
17. Vluymans S. Learning from Imbalanced Data [Internet]. In: Dealing with Imbalanced and Weakly Labelled Data in Machine Learning using Fuzzy and Rough Set Methods. Cham: Springer International Publishing; 2019 [cité 2024 mai 9]. page 81‑110.Available from: http://link.springer.com/10.1007/978-3-030-04663-7_4
18. Suykens JA, De Brabanter J, Lukas L, Vandewalle J. Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 2002;48(1‑4):85‑105.
19. Yang X, Song Q, Wang Y. A WEIGHTED SUPPORT VECTOR MACHINE FOR DATA CLASSIFICATION. Int J Patt Recogn Artif Intell 2007;21(05):961‑76.
20. Tomar D, Singhal S, Agarwal S. Weighted Least Square Twin Support Vector Machine for Imbalanced Dataset. IJDTA 2014;7(2):25‑36.
21. Xia S, Xiong Z, Luo Y, Dong L, Xing C. Relative density based support vector machine. Neurocomputing 2015;149:1424‑32.
22. Hazarika BB, Gupta D. Density-weighted support vector machines for binary class imbalance learning. Neural Comput & Applic 2021;33(9):4243‑61.
23. Horvitz DG, Thompson DJ. A Generalization of Sampling Without Replacement from a Finite Universe. Journal of the American Statistical Association 1952;47(260):663‑85.
24. Hansen MH, Hurwitz WN. The Problem of Non-Response in Sample Surveys. Journal of the American Statistical Association 1946;41(236):517‑29.
25. Deville JC, Särndal CE. Calibration Estimators in Survey Sampling. Journal of the American Statistical Association 1992;87(418):376‑82.
26. Valliant R, Dever JA, Kreuter F. Calibration and Other Uses of Auxiliary Data in Weighting. Practical Tools for Designing and Weighting Survey Samples 2013;349‑95.
27. Diallo M. samplics: a Python Package for selecting, weighting and analyzing data from complex sampling designs. JOSS 2021;6(68):3376.
28. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2011;42(4):463‑84.
29. Brownlee J. Imbalanced Classification with Python: Better Metrics, Balance Skewed Classes, Cost-Sensitive Learning. v1.2. Online: Machine Learning Mastery; 2020.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Lamyae Benhlima , Mohammed El Haj Tirari (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.