Sentence boundary detection without speech recognition : a case of an underresourced language
Joint Authors
Jamil, Nursuriati
Ramli, Muhammad Izzad
Siman, Nurayni
Source
Issue
Vol. 11, Issue 3 (30 Sep. 2015), pp.308-318, 11 p.
Publisher
Publication Date
2015-09-30
Country of Publication
Algeria
No. of Pages
11
Main Subjects
Abstract EN
Sentence boundary detection (SBD), also known as sentence segmentation decides where a sentence begins and ends.
Previous method of SBD is either done by linguistic approach or acoustic approach; or combination of both approaches.
Even though linguistic approach generally performed better than acoustic approach, it requires the need of a speech recognition component.
This is a constraint for Under Resource Languages such as the Malay language.
This paper describes the SBD for spontaneous Malay language spoken audio.
Experiments are conducted on a forty-two minutes question-answer (Q/A) Malaysia parliamentary session comprising 12 adult male speakers and 4 female speakers.
The speech datasets are first classified as speech/non-speech segments and only the non-speech segments are further tested as candidates of sentence boundaries.
Seven prosodic features, rate-of-speech and volume are then extracted from the boundary candidates for classification.
Our proposed SBD method using supervised Adaboost classifier managed a promising100% accuracy rate with 19.44% error rate.
For future work, we intend to reduce the error rate by implementing end-point detection on the boundary candidates.
American Psychological Association (APA)
Jamil, Nursuriati& Ramli, Muhammad Izzad& Siman, Nurayni. 2015. Sentence boundary detection without speech recognition : a case of an underresourced language. Journal of Electrical Systems،Vol. 11, no. 3, pp.308-318.
https://search.emarefa.net/detail/BIM-610401
Modern Language Association (MLA)
Jamil, Nursuriati…[et al.]. Sentence boundary detection without speech recognition : a case of an underresourced language. Journal of Electrical Systems Vol. 11, no. 3 (Sep. 2015), pp.308-318.
https://search.emarefa.net/detail/BIM-610401
American Medical Association (AMA)
Jamil, Nursuriati& Ramli, Muhammad Izzad& Siman, Nurayni. Sentence boundary detection without speech recognition : a case of an underresourced language. Journal of Electrical Systems. 2015. Vol. 11, no. 3, pp.308-318.
https://search.emarefa.net/detail/BIM-610401
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references : p. 317-318
Record ID
BIM-610401