Study of data mining algorithms using a dataset from the size-effect on open source software defects

Other Title(s)

دراسة خوارزميات تنقيب البيانات باستخدام مجموعة بيانات من تأثير الحجم على عيوب البرمجيات مفتوحة المصدر

Joint Authors

Isawi, Muthanna Yasin Nawwaf
Rashid, Maidah Muhsin

Source

Kirkuk University Journal-Scientific Studies

Issue

Vol. 15, Issue 2 (30 Jun. 2020), pp.25-44, 20 p.

Publisher

Kirkuk University College of Science

Publication Date

2020-06-30

Country of Publication

Iraq

No. of Pages

20

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

This article focuses on the quality of data mining algorithms in terms of the accuracy ratio and time consumption.

So, in order to figure out the best algorithm among the classification and clustering algorithms, the WEKA program will be testing all algorithms using a real dataset from the size effect on defect proneness for open source software.

The Mozilla product is adopted as an example of open source software.

The dataset that is used in this paper represents the output of the study of the size effect on defect proneness in the open source software.

The study of Mozilla product shows a significant relationship between the size of software and the number of defect proneness in software.

The Mozilla product study produced a dataset to be as inputs of the WEKA program in order to compare the data mining tools (algorithms).

We use the Naive Bayes, Decision Trees J48, Expectation-maximization for classifying and K-Star and Simple KMeans for clustering methods.

The findings demonstrate the difference between the algorithms according to the accuracy, and the time consuming to reach the result in each algorithm.

Furthermore, the effect of the software size is significant on defect proneness.

Finally, the experiments are conducted in WEKA with the aim of this research is finding out the best algorithm in terms of accuracy and time-consuming.

At the end, the paper will be figuring out the best algorithm in order to choose and depending on it in the tests of classification and clustering.

American Psychological Association (APA)

Isawi, Muthanna Yasin Nawwaf& Rashid, Maidah Muhsin. 2020. Study of data mining algorithms using a dataset from the size-effect on open source software defects. Kirkuk University Journal-Scientific Studies،Vol. 15, no. 2, pp.25-44.
https://search.emarefa.net/detail/BIM-1037082

Modern Language Association (MLA)

Isawi, Muthanna Yasin Nawwaf& Rashid, Maidah Muhsin. Study of data mining algorithms using a dataset from the size-effect on open source software defects. Kirkuk University Journal-Scientific Studies Vol. 15, no. 2 (Jun. 2020), pp.25-44.
https://search.emarefa.net/detail/BIM-1037082

American Medical Association (AMA)

Isawi, Muthanna Yasin Nawwaf& Rashid, Maidah Muhsin. Study of data mining algorithms using a dataset from the size-effect on open source software defects. Kirkuk University Journal-Scientific Studies. 2020. Vol. 15, no. 2, pp.25-44.
https://search.emarefa.net/detail/BIM-1037082

Data Type

Journal Articles

Language

English

Notes

-

Record ID

BIM-1037082