Experimenting N-grams in text categorization
Joint Authors
Rahmun, Abd al-Latif
al-Berrichi, Zakariyya
Source
The International Arab Journal of Information Technology
Issue
Vol. 4, Issue 4 (31 Oct. 2007), pp.377-385, 9 p.
Publisher
Publication Date
2007-10-31
Country of Publication
Jordan
No. of Pages
9
Main Subjects
Information Technology and Computer Science
Topics
Abstract EN
This paper deals with automatic supervised classification of documents.
The approach suggested is based on a vector representation of the documents centered not on the words but on the n-grams of characters for varying n.
The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine and Callback and Libeler distances, and two benchmark corpuses the routers-21578 newswire articles and the 20 newsgroups data for evaluation.
The evaluation was done, by using the macro averaged F1 function.
The results show the effectiveness of this approach compared to the Bag-Of-Word and stem representations.
American Psychological Association (APA)
Rahmun, Abd al-Latif& al-Berrichi, Zakariyya. 2007. Experimenting N-grams in text categorization. The International Arab Journal of Information Technology،Vol. 4, no. 4, pp.377-385.
https://search.emarefa.net/detail/BIM-11745
Modern Language Association (MLA)
Rahmun, Abd al-Latif& al-Berrichi, Zakariyya. Experimenting N-grams in text categorization. The International Arab Journal of Information Technology Vol. 4, no. 4 (Oct. 2007), pp.377-385.
https://search.emarefa.net/detail/BIM-11745
American Medical Association (AMA)
Rahmun, Abd al-Latif& al-Berrichi, Zakariyya. Experimenting N-grams in text categorization. The International Arab Journal of Information Technology. 2007. Vol. 4, no. 4, pp.377-385.
https://search.emarefa.net/detail/BIM-11745
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references : p. 384
Record ID
BIM-11745