A domain-based feature generation and convolution neural network approach for extracting adverse drug reactions from social media posts

مقدم أطروحة جامعية

Awdah, Firas

مشرف أطروحة جامعية

Tawil, Adil

الجامعة

جامعة بيرزيت

الكلية

كلية الهندسة و التكنولوجيا

القسم الأكاديمي

دائرة علم الحاسوب

دولة الجامعة

فلسطين (الضفة الغربية)

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2018

الملخص الإنجليزي

The recent popularity of the social media networks including forums, blogs, and micro-blogging networks changed the way patients share their health experiences and treatment options.

Such forums o er valuable, unsolicited, uncensored information on drug safety and side e ects directly from patients.

However, it is very challenging to extract useful information from such forums due to several factors such as grammatical and spelling errors, colloquial language, and post length limitation.

Furthermore, due to the sensitivity of the domain for adverse drug reactions (ADR) detection, it is more critical to identify correct ADRs (i.e., achieve higher classi cation precision) than identifying non-precise ones.

The aims of this thesis are: (i) to develop a new approach for ADR classi cation in twitter posts called Semantic Vector(SemVec); (ii) to explore natural language processing (NLP) approaches for generating domain features from text, and utilizing them for ADRs detection; and (iii) to improve convolution neural network (CNN) ADR classi - cation precision by incorporating domain features.

This thesis proposes a dynamic and pluggable model, named SemVec, for representing words as a vector of both domain and morphological features.

Based on the problem domain, domain features can be added or removed to generate an enriched word representation with domain knowledge.

SemVec represents each post as a matrix of word vectors, which is fed into CNN.

SemVec is scalable, can be applied to other domains by employing relevant natural language processing methods and domain lexicons.

The proposed method was evaluated on Twitter (ADR) dataset.

Results show that SemVec improves the precision of ADR detection by 13.43% over other state-of-the-art deep learning methods with a comparable recall score.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

83

قائمة المحتويات

Table of contents.

Abstract.

Chapter One : Introduction.

Chapter Two : Background.

Chapter Three : Literature review.

Chapter Four : Proposed method.

Chapter Five : Evaluation and results.

Chapter Six : Conclusion.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Awdah, Firas. (2018). A domain-based feature generation and convolution neural network approach for extracting adverse drug reactions from social media posts. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-836449

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Awdah, Firas. A domain-based feature generation and convolution neural network approach for extracting adverse drug reactions from social media posts. (Master's theses Theses and Dissertations Master). Birzeit University. (2018).
https://search.emarefa.net/detail/BIM-836449

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Awdah, Firas. (2018). A domain-based feature generation and convolution neural network approach for extracting adverse drug reactions from social media posts. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-836449

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-836449