Audiovisual speaker identification based on lip and speech modalities
Joint Authors
Chelali, Fatimah
Djeradi, Ammar
Source
The International Arab Journal of Information Technology
Issue
Vol. 14, Issue 1 (31 Jan. 2017)
Publisher
Publication Date
2017-01-31
Country of Publication
Jordan
Main Subjects
Information Technology and Computer Science
Topics
- Audiovisual aids
- Psycholinguistics
- Speeches
- Acoustics
- Data processing
- Interactive multimedia
- Public speaking
- Audio amplifiers
Abstract EN
In this article, we present a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel.
We also propose a fusion technique that combines the two modalities to make the final recognition decision.
Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers.
Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstra Coefficients (MFCC) and Perceptual Linear Predictive (PLP).
Furthermore some artificial neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.
American Psychological Association (APA)
Chelali, Fatimah& Djeradi, Ammar. 2017. Audiovisual speaker identification based on lip and speech modalities. The International Arab Journal of Information Technology،Vol. 14, no. 1.
https://search.emarefa.net/detail/BIM-693624
Modern Language Association (MLA)
Chelali, Fatimah& Djeradi, Ammar. Audiovisual speaker identification based on lip and speech modalities. The International Arab Journal of Information Technology Vol. 14, no. 1 (Jan. 2017).
https://search.emarefa.net/detail/BIM-693624
American Medical Association (AMA)
Chelali, Fatimah& Djeradi, Ammar. Audiovisual speaker identification based on lip and speech modalities. The International Arab Journal of Information Technology. 2017. Vol. 14, no. 1.
https://search.emarefa.net/detail/BIM-693624
Data Type
Journal Articles
Language
English
Notes
Includes appendices.
Record ID
BIM-693624