Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

المؤلفون المشاركون

Huang, Joshua Zhexue
Nguyen, Thanh-Tung
Nguyen, Thuy Thi

المصدر

The Scientific World Journal

العدد

المجلد 2015، العدد 2015 (31 ديسمبر/كانون الأول 2015)، ص ص. 1-18، 18ص.

الناشر

Hindawi Publishing Corporation

تاريخ النشر

2015-03-24

دولة النشر

مصر

عدد الصفحات

18

التخصصات الرئيسية

الطب البشري
تكنولوجيا المعلومات وعلم الحاسوب

الملخص EN

Random forests (RFs) have been widely used as a powerful classification method.

However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting.

This makes RFs have poor accuracy when working with high-dimensional data.

Besides that, RFs have bias in the feature selection process where multivalued features are favored.

Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data.

We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures.

This feature subset is then partitioned into two subsets.

A feature weighting sampling technique is used to sample features from these two subsets for building trees.

This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs.

An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets.

The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Nguyen, Thanh-Tung& Huang, Joshua Zhexue& Nguyen, Thuy Thi. 2015. Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data. The Scientific World Journal،Vol. 2015, no. 2015, pp.1-18.
https://search.emarefa.net/detail/BIM-1078787

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Nguyen, Thanh-Tung…[et al.]. Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data. The Scientific World Journal No. 2015 (2015), pp.1-18.
https://search.emarefa.net/detail/BIM-1078787

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Nguyen, Thanh-Tung& Huang, Joshua Zhexue& Nguyen, Thuy Thi. Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data. The Scientific World Journal. 2015. Vol. 2015, no. 2015, pp.1-18.
https://search.emarefa.net/detail/BIM-1078787

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references

رقم السجل

BIM-1078787