GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains

Joint Authors

Wei, Chih-Hsuan
Kao, Hung-Yu
Lu, Zhiyong

Source

BioMed Research International

Issue

Vol. 2015, Issue 2015 (31 Dec. 2015), pp.1-7, 7 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2015-08-25

Country of Publication

Egypt

No. of Pages

7

Main Subjects

Medicine

Abstract EN

The automatic recognition of gene names and their associated database identifiers from biomedical text has been widely studied in recent years, as these tasks play an important role in many downstream text-mining applications.

Despite significant previous research, only a small number of tools are publicly available and these tools are typically restricted to detecting only mention level gene names or only document level gene identifiers.

In this work, we report GNormPlus: an end-to-end and open source system that handles both gene mention and identifier detection.

We created a new corpus of 694 PubMed articles to support our development of GNormPlus, containing manual annotations for not only gene names and their identifiers, but also closely related concepts useful for gene name disambiguation, such as gene families and protein domains.

GNormPlus integrates several advanced text-mining techniques, including SimConcept for resolving composite gene names.

As a result, GNormPlus compares favorably to other state-of-the-art methods when evaluated on two widely used public benchmarking datasets, achieving 86.7% F1-score on the BioCreative II Gene Normalization task dataset and 50.1% F1-score on the BioCreative III Gene Normalization task dataset.

The GNormPlus source code and its annotated corpus are freely available, and the results of applying GNormPlus to the entire PubMed are freely accessible through our web-based tool PubTator.

American Psychological Association (APA)

Wei, Chih-Hsuan& Kao, Hung-Yu& Lu, Zhiyong. 2015. GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains. BioMed Research International،Vol. 2015, no. 2015, pp.1-7.
https://search.emarefa.net/detail/BIM-1057241

Modern Language Association (MLA)

Wei, Chih-Hsuan…[et al.]. GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains. BioMed Research International No. 2015 (2015), pp.1-7.
https://search.emarefa.net/detail/BIM-1057241

American Medical Association (AMA)

Wei, Chih-Hsuan& Kao, Hung-Yu& Lu, Zhiyong. GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains. BioMed Research International. 2015. Vol. 2015, no. 2015, pp.1-7.
https://search.emarefa.net/detail/BIM-1057241

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1057241