JOURNAL OF MULTIMEDIA (JMM)
ISSN : 1796-2048
Volume : 4 Issue : 4 Date : August 2009
Integration of Metamodel and Acoustic Model for Dysarthric Speech Recognition
Hironori Matsumasa,Tetsuya Takiguchi, Yasuo Ariki, I-Chao LI, and Toshitaka Nakabayashi
Full Text: PDF (952 KB)
We investigated the speech recognition of a person with articulation disorders resulting from
athetoid cerebral palsy. The articulation of the first words spoken tends to be unstable due to the
strain placed on the speech-related muscles, and this causes degradation of speech recognition.
Therefore, we proposed a robust feature extraction method based on PCA (Principal Component
Analysis) instead of MFCC, where the main stable utterance element is projected onto low-order
features and fluctuation elements of speech style are projected onto high-order features. Therefore,
the PCA-based filter will be able to extract stable utterance features only.
The fluctuation of speaking style may invoke phone fluctuations, such as substitutions, deletions
and insertions. In this paper, we discuss our effort to integrate a Metamodel and an Acoustic model
approach. Metamodels have a technique for incorporating a model of a speaker’s confusion matrix
into the ASR process in such a way as to increase recognition accuracy. The integration of
metamodels and acoustic models enables fluctuation suppression not only in feature extraction but
also in recognition. The proposed method resulted in an improvement of 9.9% (from 79.1% to 89%)
in the recognition rate compared to the conventional method.
dysarthric speech recognition, feature extraction, model integration