An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

Joint Authors

He, Peng
Li, Bing
He, Yao
Yu, Lvjun

Source

Mathematical Problems in Engineering

Issue

Vol. 2018, Issue 2018 (31 Dec. 2018), pp.1-18, 18 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2018-06-07

Country of Publication

Egypt

No. of Pages

18

Main Subjects

Civil Engineering

Abstract EN

Cross-project defect prediction (CPDP) on projects with limited historical data has attracted much attention.

To the best of our knowledge, however, the performance of existing approaches is usually poor, because of low quality cross-project training data.

The objective of this study is to propose an improved method for CPDP by simplifying training data, labeled as TDSelector, which considers both the similarity and the number of defects that each training instance has (denoted by defects), and to demonstrate the effectiveness of the proposed method.

Our work consists of three main steps.

First, we constructed TDSelector in terms of a linear weighted function of instances’ similarity and defects.

Second, the basic defect predictor used in our experiments was built by using the Logistic Regression classification algorithm.

Third, we analyzed the impacts of different combinations of similarity and the normalization of defects on prediction performance and then compared with two existing methods.

We evaluated our method on 14 projects collected from two public repositories.

The results suggest that the proposed TDSelector method performs, on average, better than both baseline methods, and the AUC values are increased by up to 10.6% and 4.3%, respectively.

That is, the inclusion of defects is indeed helpful to select high quality training instances for CPDP.

On the other hand, the combination of Euclidean distance and linear normalization is the preferred way for TDSelector.

An additional experiment also shows that selecting those instances with more bugs directly as training data can further improve the performance of the bug predictor trained by our method.

American Psychological Association (APA)

He, Peng& He, Yao& Yu, Lvjun& Li, Bing. 2018. An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data. Mathematical Problems in Engineering،Vol. 2018, no. 2018, pp.1-18.
https://search.emarefa.net/detail/BIM-1206378

Modern Language Association (MLA)

He, Peng…[et al.]. An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data. Mathematical Problems in Engineering No. 2018 (2018), pp.1-18.
https://search.emarefa.net/detail/BIM-1206378

American Medical Association (AMA)

He, Peng& He, Yao& Yu, Lvjun& Li, Bing. An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data. Mathematical Problems in Engineering. 2018. Vol. 2018, no. 2018, pp.1-18.
https://search.emarefa.net/detail/BIM-1206378

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1206378