Design and Implementation of a Machine Learning-Based Authorship Identification Model

Joint Authors

Bajwa, Imran Sarwar
Ramzan, Shabana
Anwar, Waheed

Source

Scientific Programming

Issue

Vol. 2019, Issue 2019 (31 Dec. 2019), pp.1-14, 14 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2019-01-16

Country of Publication

Egypt

No. of Pages

14

Main Subjects

Mathematics

Abstract EN

In this paper, a novel approach is presented for authorship identification in English and Urdu text using the LDA model with n-grams texts of authors and cosine similarity.

The proposed approach uses similarity metrics to identify various learned representations of stylometric features and uses them to identify the writing style of a particular author.

The proposed LDA-based approach emphasizes instance-based and profile-based classifications of an author’s text.

Here, LDA suitably handles high-dimensional and sparse data by allowing more expressive representation of text.

The presented approach is an unsupervised computational methodology that can handle the heterogeneity of the dataset, diversity in writing, and the inherent ambiguity of the Urdu language.

A large corpus has been used for performance testing of the presented approach.

The results of experiments show superiority of the proposed approach over the state-of-the-art representations and other algorithms used for authorship identification.

The contributions of the presented work are the use of cosine similarity with n-gram-based LDA topics to measure similarity in vectors of text documents.

Achievement of overall 84.52% accuracy on PAN12 datasets and 93.17% accuracy on Urdu news articles without using any labels for authorship identification task is done.

American Psychological Association (APA)

Anwar, Waheed& Bajwa, Imran Sarwar& Ramzan, Shabana. 2019. Design and Implementation of a Machine Learning-Based Authorship Identification Model. Scientific Programming،Vol. 2019, no. 2019, pp.1-14.
https://search.emarefa.net/detail/BIM-1210773

Modern Language Association (MLA)

Anwar, Waheed…[et al.]. Design and Implementation of a Machine Learning-Based Authorship Identification Model. Scientific Programming No. 2019 (2019), pp.1-14.
https://search.emarefa.net/detail/BIM-1210773

American Medical Association (AMA)

Anwar, Waheed& Bajwa, Imran Sarwar& Ramzan, Shabana. Design and Implementation of a Machine Learning-Based Authorship Identification Model. Scientific Programming. 2019. Vol. 2019, no. 2019, pp.1-14.
https://search.emarefa.net/detail/BIM-1210773

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1210773