Proposed method to enhance text document clustering using improved fuzzy c mean algorithm with named entity tag

Other Title(s)

طريقة مقترحة لتحسين عنقدة الوثائق النصية باستخدام خوارزمية العنقدة المضببة المحسنة مع علامات أسماء الكيانات

Joint Authors

Hadi, Raghad Muhammad
Mahmud, Abir Tariq
Hashim, Sukaynah Hasan

Source

al-Mansour

Issue

Vol. 2017, Issue 28 (31 Dec. 2017), pp.43-62, 20 p.

Publisher

al-Mansour University College

Publication Date

2017-12-31

Country of Publication

Iraq

No. of Pages

20

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

Text document clustering denotes to the clustering of correlated text documents into groups for unsupervised document society, text data mining, and involuntary theme extraction.

The most common document representation model is vector space model (VSM) which embodies a set of documents as vectors of vital terms, outmoded document clustering methods collection related documents lacking at all user contact.

The proposed method in this paper is an attempt to discover how clustering might be better-quality with user direction by selecting features to separate documents.

These features are the tag appear in documents, like Named Entity tag which denote to important information for cluster names in text, through introducing a design system for documents representation model which takes into account create combined features of named entity tag and use improvement Fuzzy clustering algorithms.

The proposed method is tested in two levels, first level uses only vector space model with traditional Fuzzy c mean, and the second level uses vector space model with combined features of named entity tag and use improvement fuzzy c mean algorithm, through uses a subset of Reuters 21578 datasets that contains 1150 documents of ten topics (150) document for each topic.

The results show that using second level as clustering techniques for text documents clustering achieves good performance with an average categorization accuracy of 90%.-

American Psychological Association (APA)

Hadi, Raghad Muhammad& Hashim, Sukaynah Hasan& Mahmud, Abir Tariq. 2017. Proposed method to enhance text document clustering using improved fuzzy c mean algorithm with named entity tag. al-Mansour،Vol. 2017, no. 28, pp.43-62.
https://search.emarefa.net/detail/BIM-760760

Modern Language Association (MLA)

Hadi, Raghad Muhammad…[et al.]. Proposed method to enhance text document clustering using improved fuzzy c mean algorithm with named entity tag. al-Mansour No. 28 (2017), pp.43-62.
https://search.emarefa.net/detail/BIM-760760

American Medical Association (AMA)

Hadi, Raghad Muhammad& Hashim, Sukaynah Hasan& Mahmud, Abir Tariq. Proposed method to enhance text document clustering using improved fuzzy c mean algorithm with named entity tag. al-Mansour. 2017. Vol. 2017, no. 28, pp.43-62.
https://search.emarefa.net/detail/BIM-760760

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 61

Record ID

BIM-760760