PCA reduced forest for learning to rank spoken transcriptions
Joint Authors
Sabri, Faridah
Hadhud, Mayyadah
Darwish, Nifin
Source
Journal of Al-Azhar University Engineering Sector
Issue
Vol. 13, Issue 46 (31 Jan. 2018), pp.122-132, 11 p.
Publisher
al-Azhar University Faculty of Engineering
Publication Date
2018-01-31
Country of Publication
Egypt
No. of Pages
11
Main Subjects
Topics
Abstract EN
This paper discusses the problem of learning to rank specially for spoken transcriptions.
the state-of-art approach for text / web documents is to apply machine learning techniques to learn a ranking model from labeled query-documents pairs with their features.
one of the best state-of-art learning algorithms is the random forest, however it does not perform very well when features are dependent or are monotonic transformation of other features as this makes the trees of the forest less independent.
we propose to use principal component analysis (PCA) to bags of features, in order to reduce them to simplify the model and have a surrogate score for each field's features producing more independent set of features for the random forest.
using this technique for a transcriptions dataset, 4.32% improvement in terms of expected reciprocal rank (ERR@10) and 0.4% improvement in terms of normalized discounted cumulative gain (NDCG@10) for training data are achieved with very comparable results for the testing data.
we emphasized the effectiveness of the technique by applying it to the larger and benchmarked web documents dataset; Microsoft LETOR.
an improvement of 7.99% and 1.29% for test data are achieved for the two used metrics respectively.
American Psychological Association (APA)
Sabri, Faridah& Darwish, Nifin& Hadhud, Mayyadah. 2018. PCA reduced forest for learning to rank spoken transcriptions. Journal of Al-Azhar University Engineering Sector،Vol. 13, no. 46, pp.122-132.
https://search.emarefa.net/detail/BIM-918377
Modern Language Association (MLA)
Sabri, Faridah…[et al.]. PCA reduced forest for learning to rank spoken transcriptions. Journal of Al-Azhar University Engineering Sector Vol. 13, no. 46 (Jan. 2018), pp.122-132.
https://search.emarefa.net/detail/BIM-918377
American Medical Association (AMA)
Sabri, Faridah& Darwish, Nifin& Hadhud, Mayyadah. PCA reduced forest for learning to rank spoken transcriptions. Journal of Al-Azhar University Engineering Sector. 2018. Vol. 13, no. 46, pp.122-132.
https://search.emarefa.net/detail/BIM-918377
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references.
Record ID
BIM-918377