A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction

Joint Authors

Bashir, Kamal
Li, Tianrui
Yahya, Mahama

Source

The International Arab Journal of Information Technology

Issue

Vol. 17, Issue 5 (30 Sep. 2020), pp.721-730, 10 p.

Publisher

Zarqa University Deanship of Scientific Research

Publication Date

2020-09-30

Country of Publication

Jordan

No. of Pages

10

Main Subjects

Electronic engineering

Abstract EN

The most frequently used machine learning feature ranking approaches failed to present optimal feature subset for accurate prediction of defective software modules in out-of-sample data.

Machine learning Feature Selection (FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio (GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at prediction, even after balancing class distribution in the training data.

In this study, we propose a novel FS method based on the Maximum Likelihood Logistic Regression (MLLR).

We apply this method on six software defect datasets in their sampled and unsampled forms to select useful features for classification in the context of Software Defect Prediction (SDP).

The Support Vector Machine (SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are based on sampled and unsampled datasets.

The performance of the models captured using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are compared for all FS methods considered.

The Analysis of Variance (ANOVA) F-test results validate the superiority of the proposed method over all the FS techniques, both in sampled and unsampled data.

The results confirm that the MLLR can be useful in selecting optimal feature subset for more accurate prediction of defective modules in software development process.

American Psychological Association (APA)

Bashir, Kamal& Li, Tianrui& Yahya, Mahama. 2020. A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction. The International Arab Journal of Information Technology،Vol. 17, no. 5, pp.721-730.
https://search.emarefa.net/detail/BIM-1439746

Modern Language Association (MLA)

Bashir, Kamal…[et al.]. A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction. The International Arab Journal of Information Technology Vol. 17, no. 5 (Sep. 2020), pp.721-730.
https://search.emarefa.net/detail/BIM-1439746

American Medical Association (AMA)

Bashir, Kamal& Li, Tianrui& Yahya, Mahama. A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction. The International Arab Journal of Information Technology. 2020. Vol. 17, no. 5, pp.721-730.
https://search.emarefa.net/detail/BIM-1439746

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 729-730

Record ID

BIM-1439746