Distance Variance Score: An Efficient Feature Selection Method in Text Classification

Joint Authors

Wang, Heyong
Hong, Ming

Source

Mathematical Problems in Engineering

Issue

Vol. 2015, Issue 2015 (31 Dec. 2015), pp.1-10, 10 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2015-05-11

Country of Publication

Egypt

No. of Pages

10

Main Subjects

Civil Engineering

Abstract EN

With the rapid development of web applications such as social network, a large amount of electric text data is accumulated and available on the Internet, which causes increasing interests in text mining.

Text classification is one of the most important subfields of text mining.

In fact, text documents are often represented as a high-dimensional sparse document term matrix (DTM) before classification.

Feature selection is essential and vital for text classification due to high dimensionality and sparsity of DTM.

An efficient feature selection method is capable of both reducing dimensions of DTM and selecting discriminative features for text classification.

Laplacian Score (LS) is one of the unsupervised feature selection methods and it has been successfully used in areas such as face recognition.

However, LS is unable to select discriminative features for text classification and to effectively reduce the sparsity of DTM.

To improve it, this paper proposes an unsupervised feature selection method named Distance Variance Score (DVS).

DVS uses feature distance contribution (a ratio) to rank the importance of features for text documents so as to select discriminative features.

Experimental results indicate that DVS is able to select discriminative features and reduce the sparsity of DTM.

Thus, it is much more efficient than LS.

American Psychological Association (APA)

Wang, Heyong& Hong, Ming. 2015. Distance Variance Score: An Efficient Feature Selection Method in Text Classification. Mathematical Problems in Engineering،Vol. 2015, no. 2015, pp.1-10.
https://search.emarefa.net/detail/BIM-1074488

Modern Language Association (MLA)

Wang, Heyong& Hong, Ming. Distance Variance Score: An Efficient Feature Selection Method in Text Classification. Mathematical Problems in Engineering No. 2015 (2015), pp.1-10.
https://search.emarefa.net/detail/BIM-1074488

American Medical Association (AMA)

Wang, Heyong& Hong, Ming. Distance Variance Score: An Efficient Feature Selection Method in Text Classification. Mathematical Problems in Engineering. 2015. Vol. 2015, no. 2015, pp.1-10.
https://search.emarefa.net/detail/BIM-1074488

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1074488