A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction
Joint Authors
Bashir, Kamal
Li, Tianrui
Yahya, Mahama
Source
The International Arab Journal of Information Technology
Issue
Vol. 17, Issue 5 (30 Sep. 2020), pp.721-730, 10 p.
Publisher
Zarqa University Deanship of Scientific Research
Publication Date
2020-09-30
Country of Publication
Jordan
No. of Pages
10
Main Subjects
Abstract EN
The most frequently used machine learning feature ranking approaches failed to present optimal feature subset for accurate prediction of defective software modules in out-of-sample data.
Machine learning Feature Selection (FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio (GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at prediction, even after balancing class distribution in the training data.
In this study, we propose a novel FS method based on the Maximum Likelihood Logistic Regression (MLLR).
We apply this method on six software defect datasets in their sampled and unsampled forms to select useful features for classification in the context of Software Defect Prediction (SDP).
The Support Vector Machine (SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are based on sampled and unsampled datasets.
The performance of the models captured using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are compared for all FS methods considered.
The Analysis of Variance (ANOVA) F-test results validate the superiority of the proposed method over all the FS techniques, both in sampled and unsampled data.
The results confirm that the MLLR can be useful in selecting optimal feature subset for more accurate prediction of defective modules in software development process.
American Psychological Association (APA)
Bashir, Kamal& Li, Tianrui& Yahya, Mahama. 2020. A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction. The International Arab Journal of Information Technology،Vol. 17, no. 5, pp.721-730.
https://search.emarefa.net/detail/BIM-1439746
Modern Language Association (MLA)
Bashir, Kamal…[et al.]. A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction. The International Arab Journal of Information Technology Vol. 17, no. 5 (Sep. 2020), pp.721-730.
https://search.emarefa.net/detail/BIM-1439746
American Medical Association (AMA)
Bashir, Kamal& Li, Tianrui& Yahya, Mahama. A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction. The International Arab Journal of Information Technology. 2020. Vol. 17, no. 5, pp.721-730.
https://search.emarefa.net/detail/BIM-1439746
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references : p. 729-730
Record ID
BIM-1439746