PCA reduced forest for learning to rank spoken transcriptions

Joint Authors

Sabri, Faridah
Hadhud, Mayyadah
Darwish, Nifin

Source

Journal of Al-Azhar University Engineering Sector

Issue

Vol. 13, Issue 46 (31 Jan. 2018), pp.122-132, 11 p.

Publisher

al-Azhar University Faculty of Engineering

Publication Date

2018-01-31

Country of Publication

Egypt

No. of Pages

11

Main Subjects

Electronic engineering

Topics

Abstract EN

This paper discusses the problem of learning to rank specially for spoken transcriptions.

the state-of-art approach for text / web documents is to apply machine learning techniques to learn a ranking model from labeled query-documents pairs with their features.

one of the best state-of-art learning algorithms is the random forest, however it does not perform very well when features are dependent or are monotonic transformation of other features as this makes the trees of the forest less independent.

we propose to use principal component analysis (PCA) to bags of features, in order to reduce them to simplify the model and have a surrogate score for each field's features producing more independent set of features for the random forest.

using this technique for a transcriptions dataset, 4.32% improvement in terms of expected reciprocal rank (ERR@10) and 0.4% improvement in terms of normalized discounted cumulative gain (NDCG@10) for training data are achieved with very comparable results for the testing data.

we emphasized the effectiveness of the technique by applying it to the larger and benchmarked web documents dataset; Microsoft LETOR.

an improvement of 7.99% and 1.29% for test data are achieved for the two used metrics respectively.

American Psychological Association (APA)

Sabri, Faridah& Darwish, Nifin& Hadhud, Mayyadah. 2018. PCA reduced forest for learning to rank spoken transcriptions. Journal of Al-Azhar University Engineering Sector،Vol. 13, no. 46, pp.122-132.
https://search.emarefa.net/detail/BIM-918377

Modern Language Association (MLA)

Sabri, Faridah…[et al.]. PCA reduced forest for learning to rank spoken transcriptions. Journal of Al-Azhar University Engineering Sector Vol. 13, no. 46 (Jan. 2018), pp.122-132.
https://search.emarefa.net/detail/BIM-918377

American Medical Association (AMA)

Sabri, Faridah& Darwish, Nifin& Hadhud, Mayyadah. PCA reduced forest for learning to rank spoken transcriptions. Journal of Al-Azhar University Engineering Sector. 2018. Vol. 13, no. 46, pp.122-132.
https://search.emarefa.net/detail/BIM-918377

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references.

Record ID

BIM-918377