Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization

Joint Authors

Yang, Jieming
Qu, Zhaoyang
Liu, Zhiying

Source

The Scientific World Journal

Issue

Vol. 2014, Issue 2014 (31 Dec. 2014), pp.1-17, 17 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2014-05-26

Country of Publication

Egypt

No. of Pages

17

Main Subjects

Medicine
Information Technology and Computer Science

Abstract EN

The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization.

Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset.

In this paper, a new scheme was proposed, which can weaken the adverse effect caused by the imbalance factor in the corpus.

We evaluated the improved versions of nine well-known feature-selection methods (Information Gain, Chi statistic, Document Frequency, Orthogonal Centroid Feature Selection, DIA association factor, Comprehensive Measurement Feature Selection, Deviation from Poisson Feature Selection, improved Gini index, and Mutual Information) using naïve Bayes and support vector machines on three benchmark document collections (20-Newsgroups, Reuters-21578, and WebKB).

The experimental results show that the improved scheme can significantly enhance the performance of the feature-selection methods.

American Psychological Association (APA)

Yang, Jieming& Qu, Zhaoyang& Liu, Zhiying. 2014. Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization. The Scientific World Journal،Vol. 2014, no. 2014, pp.1-17.
https://search.emarefa.net/detail/BIM-1050395

Modern Language Association (MLA)

Yang, Jieming…[et al.]. Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization. The Scientific World Journal No. 2014 (2014), pp.1-17.
https://search.emarefa.net/detail/BIM-1050395

American Medical Association (AMA)

Yang, Jieming& Qu, Zhaoyang& Liu, Zhiying. Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization. The Scientific World Journal. 2014. Vol. 2014, no. 2014, pp.1-17.
https://search.emarefa.net/detail/BIM-1050395

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1050395