Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition

Publication Date

2019-03-03

Country of Publication

Egypt

No. of Pages

Main Subjects

Biology

Abstract EN

Serbian is in a group of highly inflective and morphologically rich languages that use a lot of different word suffixes to express different grammatical, syntactic, or semantic features.

This kind of behaviour usually produces a lot of recognition errors, especially in large vocabulary systems—even when, due to good acoustical matching, the correct lemma is predicted by the automatic speech recognition system, often a wrong word ending occurs, which is nevertheless counted as an error.

This effect is larger for contexts not present in the language model training corpus.

In this manuscript, an approach which takes into account different morphological categories of words for language modeling is examined, and the benefits in terms of word error rates and perplexities are presented.

These categories include word type, word case, grammatical number, and gender, and they were all assigned to words in the system vocabulary, where applicable.

These additional word features helped to produce significant improvements in relation to the baseline system, both for n-gram-based and neural network-based language models.

The proposed system can help overcome a lot of tedious errors in a large vocabulary system, for example, for dictation, both for Serbian and for other languages with similar characteristics.

American Psychological Association (APA)

Pakoci, Edvin& Popović, Branislav& Pekar, Darko. 2019. Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition. Computational Intelligence and Neuroscience،Vol. 2019, no. 2019, pp.1-8.
https://search.emarefa.net/detail/BIM-1129482

Modern Language Association (MLA)

Pakoci, Edvin…[et al.]. Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition. Computational Intelligence and Neuroscience No. 2019 (2019), pp.1-8.
https://search.emarefa.net/detail/BIM-1129482

American Medical Association (AMA)

Pakoci, Edvin& Popović, Branislav& Pekar, Darko. Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition. Computational Intelligence and Neuroscience. 2019. Vol. 2019, no. 2019, pp.1-8.
https://search.emarefa.net/detail/BIM-1129482

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1129482

SaveSaved Print

Arab Citation & Impact Factor "Arcif"

Largest Arabic Database of Citations Analysis for the Arabic Scholarly Journals Issued in Arab World.

eMarefa Indicators
for Arab Scientific Production

"Kashif" for Checking Similarity or Plagiarism in the Arabic Researches. know more