Sentence boundary detection without speech recognition : a case of an underresourced language

المؤلفون المشاركون

Jamil, Nursuriati
Ramli, Muhammad Izzad
Siman, Nurayni

المصدر

Journal of Electrical Systems

العدد

المجلد 11، العدد 3 (30 سبتمبر/أيلول 2015)، ص ص. 308-318، 11ص.

الناشر

دار النجم الثاقب

تاريخ النشر

2015-09-30

دولة النشر

الجزائر

عدد الصفحات

11

التخصصات الرئيسية

الهندسة الكهربائية

الملخص EN

Sentence boundary detection (SBD), also known as sentence segmentation decides where a sentence begins and ends.

Previous method of SBD is either done by linguistic approach or acoustic approach; or combination of both approaches.

Even though linguistic approach generally performed better than acoustic approach, it requires the need of a speech recognition component.

This is a constraint for Under Resource Languages such as the Malay language.

This paper describes the SBD for spontaneous Malay language spoken audio.

Experiments are conducted on a forty-two minutes question-answer (Q/A) Malaysia parliamentary session comprising 12 adult male speakers and 4 female speakers.

The speech datasets are first classified as speech/non-speech segments and only the non-speech segments are further tested as candidates of sentence boundaries.

Seven prosodic features, rate-of-speech and volume are then extracted from the boundary candidates for classification.

Our proposed SBD method using supervised Adaboost classifier managed a promising100% accuracy rate with 19.44% error rate.

For future work, we intend to reduce the error rate by implementing end-point detection on the boundary candidates.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Jamil, Nursuriati& Ramli, Muhammad Izzad& Siman, Nurayni. 2015. Sentence boundary detection without speech recognition : a case of an underresourced language. Journal of Electrical Systems،Vol. 11, no. 3, pp.308-318.
https://search.emarefa.net/detail/BIM-610401

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Jamil, Nursuriati…[et al.]. Sentence boundary detection without speech recognition : a case of an underresourced language. Journal of Electrical Systems Vol. 11, no. 3 (Sep. 2015), pp.308-318.
https://search.emarefa.net/detail/BIM-610401

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Jamil, Nursuriati& Ramli, Muhammad Izzad& Siman, Nurayni. Sentence boundary detection without speech recognition : a case of an underresourced language. Journal of Electrical Systems. 2015. Vol. 11, no. 3, pp.308-318.
https://search.emarefa.net/detail/BIM-610401

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references : p. 317-318

رقم السجل

BIM-610401