Sentence boundary detection without speech recognition : a case of an underresourced language

Joint Authors

Jamil, Nursuriati
Ramli, Muhammad Izzad
Siman, Nurayni

Source

Journal of Electrical Systems

Issue

Vol. 11, Issue 3 (30 Sep. 2015), pp.308-318, 11 p.

Publisher

Piercing Star House

Publication Date

2015-09-30

Country of Publication

Algeria

No. of Pages

11

Main Subjects

Electronic engineering

Abstract EN

Sentence boundary detection (SBD), also known as sentence segmentation decides where a sentence begins and ends.

Previous method of SBD is either done by linguistic approach or acoustic approach; or combination of both approaches.

Even though linguistic approach generally performed better than acoustic approach, it requires the need of a speech recognition component.

This is a constraint for Under Resource Languages such as the Malay language.

This paper describes the SBD for spontaneous Malay language spoken audio.

Experiments are conducted on a forty-two minutes question-answer (Q/A) Malaysia parliamentary session comprising 12 adult male speakers and 4 female speakers.

The speech datasets are first classified as speech/non-speech segments and only the non-speech segments are further tested as candidates of sentence boundaries.

Seven prosodic features, rate-of-speech and volume are then extracted from the boundary candidates for classification.

Our proposed SBD method using supervised Adaboost classifier managed a promising100% accuracy rate with 19.44% error rate.

For future work, we intend to reduce the error rate by implementing end-point detection on the boundary candidates.

American Psychological Association (APA)

Jamil, Nursuriati& Ramli, Muhammad Izzad& Siman, Nurayni. 2015. Sentence boundary detection without speech recognition : a case of an underresourced language. Journal of Electrical Systems،Vol. 11, no. 3, pp.308-318.
https://search.emarefa.net/detail/BIM-610401

Modern Language Association (MLA)

Jamil, Nursuriati…[et al.]. Sentence boundary detection without speech recognition : a case of an underresourced language. Journal of Electrical Systems Vol. 11, no. 3 (Sep. 2015), pp.308-318.
https://search.emarefa.net/detail/BIM-610401

American Medical Association (AMA)

Jamil, Nursuriati& Ramli, Muhammad Izzad& Siman, Nurayni. Sentence boundary detection without speech recognition : a case of an underresourced language. Journal of Electrical Systems. 2015. Vol. 11, no. 3, pp.308-318.
https://search.emarefa.net/detail/BIM-610401

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 317-318

Record ID

BIM-610401