Distance Variance Score: An Efficient Feature Selection Method in Text Classification

المؤلفون المشاركون

Wang, Heyong
Hong, Ming

المصدر

Mathematical Problems in Engineering

العدد

المجلد 2015، العدد 2015 (31 ديسمبر/كانون الأول 2015)، ص ص. 1-10، 10ص.

الناشر

Hindawi Publishing Corporation

تاريخ النشر

2015-05-11

دولة النشر

مصر

عدد الصفحات

10

التخصصات الرئيسية

هندسة مدنية

الملخص EN

With the rapid development of web applications such as social network, a large amount of electric text data is accumulated and available on the Internet, which causes increasing interests in text mining.

Text classification is one of the most important subfields of text mining.

In fact, text documents are often represented as a high-dimensional sparse document term matrix (DTM) before classification.

Feature selection is essential and vital for text classification due to high dimensionality and sparsity of DTM.

An efficient feature selection method is capable of both reducing dimensions of DTM and selecting discriminative features for text classification.

Laplacian Score (LS) is one of the unsupervised feature selection methods and it has been successfully used in areas such as face recognition.

However, LS is unable to select discriminative features for text classification and to effectively reduce the sparsity of DTM.

To improve it, this paper proposes an unsupervised feature selection method named Distance Variance Score (DVS).

DVS uses feature distance contribution (a ratio) to rank the importance of features for text documents so as to select discriminative features.

Experimental results indicate that DVS is able to select discriminative features and reduce the sparsity of DTM.

Thus, it is much more efficient than LS.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Wang, Heyong& Hong, Ming. 2015. Distance Variance Score: An Efficient Feature Selection Method in Text Classification. Mathematical Problems in Engineering،Vol. 2015, no. 2015, pp.1-10.
https://search.emarefa.net/detail/BIM-1074488

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Wang, Heyong& Hong, Ming. Distance Variance Score: An Efficient Feature Selection Method in Text Classification. Mathematical Problems in Engineering No. 2015 (2015), pp.1-10.
https://search.emarefa.net/detail/BIM-1074488

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Wang, Heyong& Hong, Ming. Distance Variance Score: An Efficient Feature Selection Method in Text Classification. Mathematical Problems in Engineering. 2015. Vol. 2015, no. 2015, pp.1-10.
https://search.emarefa.net/detail/BIM-1074488

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references

رقم السجل

BIM-1074488