Design and Implementation of a Machine Learning-Based Authorship Identification Model

المؤلفون المشاركون

Bajwa, Imran Sarwar
Ramzan, Shabana
Anwar, Waheed

المصدر

Scientific Programming

العدد

المجلد 2019، العدد 2019 (31 ديسمبر/كانون الأول 2019)، ص ص. 1-14، 14ص.

الناشر

Hindawi Publishing Corporation

تاريخ النشر

2019-01-16

دولة النشر

مصر

عدد الصفحات

14

التخصصات الرئيسية

الرياضيات

الملخص EN

In this paper, a novel approach is presented for authorship identification in English and Urdu text using the LDA model with n-grams texts of authors and cosine similarity.

The proposed approach uses similarity metrics to identify various learned representations of stylometric features and uses them to identify the writing style of a particular author.

The proposed LDA-based approach emphasizes instance-based and profile-based classifications of an author’s text.

Here, LDA suitably handles high-dimensional and sparse data by allowing more expressive representation of text.

The presented approach is an unsupervised computational methodology that can handle the heterogeneity of the dataset, diversity in writing, and the inherent ambiguity of the Urdu language.

A large corpus has been used for performance testing of the presented approach.

The results of experiments show superiority of the proposed approach over the state-of-the-art representations and other algorithms used for authorship identification.

The contributions of the presented work are the use of cosine similarity with n-gram-based LDA topics to measure similarity in vectors of text documents.

Achievement of overall 84.52% accuracy on PAN12 datasets and 93.17% accuracy on Urdu news articles without using any labels for authorship identification task is done.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Anwar, Waheed& Bajwa, Imran Sarwar& Ramzan, Shabana. 2019. Design and Implementation of a Machine Learning-Based Authorship Identification Model. Scientific Programming،Vol. 2019, no. 2019, pp.1-14.
https://search.emarefa.net/detail/BIM-1210773

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Anwar, Waheed…[et al.]. Design and Implementation of a Machine Learning-Based Authorship Identification Model. Scientific Programming No. 2019 (2019), pp.1-14.
https://search.emarefa.net/detail/BIM-1210773

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Anwar, Waheed& Bajwa, Imran Sarwar& Ramzan, Shabana. Design and Implementation of a Machine Learning-Based Authorship Identification Model. Scientific Programming. 2019. Vol. 2019, no. 2019, pp.1-14.
https://search.emarefa.net/detail/BIM-1210773

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references

رقم السجل

BIM-1210773