Automatic Speech Recognition (ASR) is a technology for identifying uttered word(s) represented as an acoustic signal. However, one of the important aspects of a noise-robust ASR system is its ability to recognise speech accurately in noisy conditions. This paper studies the applications of Multi-Nets Artificial Neural Networks (M-N ANNs), a realisation of multiple-views multiple-learners approach, as Multi-Networks Speech Recognisers (M-NSRs) in providing a real-time, frequency-based noise-robust ASR model. M-NSRs define speech features associated with each word as a different view and apply a standalone ANN as one of the learners to approximate that view; meanwhile, multiple-views single-learner (MVSL) ANN-based speech recognisers employ only one ANN to memorise the features of the entire vocabulary. In this research, an M-NSR was provided and evaluated using unforeseen test data that were affected by white, brown, and pink noises; more specifically, 27 experiments were conducted on noisy speech to measure the accuracy and recognition rate of the proposed model. Furthermore, the results of the M-NSR were compared in detail with an MVSL ANN-based ASR system. The M-NSR recorded an improved average recognition rate by up to 20.14% when it was given the test data infected with noise in our experiments. It is shown that the M-NSR with higher degree of generalisability can handle frequency-based noise because it has higher recognition rate than the previous model under noisy conditions.
Keywords: Multiple-views multiple-learners; Automatic Speech Recognition; Artificial neural network; Noise robustness; Frequency-based noise
Published by: Neurocmputing, Impact Factor 2.005 (2013), Indexed by Web of Science
Full Title: Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach
Link to Full Paper: Neurocomputing
Tuesday, August 4, 2015
Downloded 28 times.