Using wordnet for text categorization
Joint Authors
Rahmun, Abd al-Latif
al-Berrichi, Zakariyya
Bentaalah, Muhammad Amin
Source
The International Arab Journal of Information Technology
Issue
Vol. 5, Issue 1 (31 Jan. 2008), pp.17-24, 8 p.
Publisher
Publication Date
2008-01-31
Country of Publication
Jordan
No. of Pages
8
Main Subjects
Information Technology and Computer Science
Abstract EN
This paper explores a method that use Word Net concept to categorize text documents.
The bag of words representation used for text representation is unsatisfactory as it ignores possible relations between terms.
The proposed method extracts generic concepts from Word Net for all the terms in the text then combines them with the terms in different ways to form a new representative vector.
The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine distance and two benchmark corpus the reuters-21578 newswire articles and the 20 newsgroups data for evaluation.
The proposed method is especially effective in raising the macro-averaged F1 value, which increased to 0.714 for the Reuters from 0.649 and to 0.719 for the 20 newsgroups from 0.667.
American Psychological Association (APA)
al-Berrichi, Zakariyya& Rahmun, Abd al-Latif& Bentaalah, Muhammad Amin. 2008. Using wordnet for text categorization. The International Arab Journal of Information Technology،Vol. 5, no. 1, pp.17-24.
https://search.emarefa.net/detail/BIM-10554
Modern Language Association (MLA)
Bentaalah, Muhammad Amin…[et al.]. Using wordnet for text categorization. The International Arab Journal of Information Technology Vol. 5, no. 1 (Jan. 2008), pp.17-24.
https://search.emarefa.net/detail/BIM-10554
American Medical Association (AMA)
al-Berrichi, Zakariyya& Rahmun, Abd al-Latif& Bentaalah, Muhammad Amin. Using wordnet for text categorization. The International Arab Journal of Information Technology. 2008. Vol. 5, no. 1, pp.17-24.
https://search.emarefa.net/detail/BIM-10554
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references : p. 23
Record ID
BIM-10554