Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification

المؤلفون المشاركون

Zhang, Ying
Wang, Chao
Han, Shuguang

المصدر

BioMed Research International

العدد

المجلد 2020، العدد 2020 (31 ديسمبر/كانون الأول 2020)، ص ص. 1-11، 11ص.

الناشر

Hindawi Publishing Corporation

تاريخ النشر

2020-05-29

دولة النشر

مصر

عدد الصفحات

11

التخصصات الرئيسية

الطب البشري

الملخص EN

Fungi play essential roles in many ecological processes, and taxonomic classification is fundamental for microbial community characterization and vital for the study and preservation of fungal biodiversity.

To cope with massive fungal barcode data, tools that can implement extensive volumes of barcode sequences, especially the internal transcribed spacer (ITS) region, are necessary.

However, high variation in the ITS region and computational requirements for processing high-dimensional features remain challenging for existing predictors.

In this study, we developed Its2vec, a bioinformatics tool for the classification of fungal ITS barcodes to the species level.

An ITS database covering more than 25,000 species in a broad range of fungal taxa was assembled.

For dimensionality reduction, a word embedding algorithm was used to represent an ITS sequence as a dense low-dimensional vector.

A random forest-based classifier was built for species identification.

Benchmarking results showed that our model achieved an accuracy comparable to that of several state-of-the-art predictors, and more importantly, it could implement large datasets and greatly reduce dimensionality.

We expect the Its2vec model to be helpful for fungal species identification and, thus, for revealing microbial community structures and in deepening our understanding of their functional mechanisms.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Wang, Chao& Zhang, Ying& Han, Shuguang. 2020. Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification. BioMed Research International،Vol. 2020, no. 2020, pp.1-11.
https://search.emarefa.net/detail/BIM-1132522

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Wang, Chao…[et al.]. Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification. BioMed Research International No. 2020 (2020), pp.1-11.
https://search.emarefa.net/detail/BIM-1132522

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Wang, Chao& Zhang, Ying& Han, Shuguang. Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification. BioMed Research International. 2020. Vol. 2020, no. 2020, pp.1-11.
https://search.emarefa.net/detail/BIM-1132522

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references

رقم السجل

BIM-1132522