Speech Isolation and Recognition in Crowded Noise Using a Dual-Path Recurrent Neural Network
DOI:
https://doi.org/10.24996/ijs.2024.65.10.37Keywords:
speech separation, dual path recurrent neural network, long short-term memory, Time-Domain audio Separation Network, LibriMix datasetAbstract
Speech separation is crucial for effective speech processing in multi-talker conditions, especially in real-time, low-latency applications. In this study, the Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network were used to perform a time-domain multiple-speaker speech separation challenge. One-dimensional conventional recurrent neural networks (RNNs) are not capable of accurately simulating long sequences. When their receptive length exceeds the sequence field, 1-D CNNs cannot recreate utterance-level sequences. Dual-Path Recurrent Neural Network (DPRNN) breaks up the lengthy sequential input that progressively performs intra- and inter-chunk operations with input lengths proportional to the square root of the beginning sequence length. Model outputs are more efficient than earlier systems, improving performance on the Libri Mix dataset. Investigations show that the DPRNN, sample-level modeling, and time-domain audio separation network can replace present methods. EEND-SS and other separation algorithms perform worse than DPRNN. The suggested model was able to achieve (12.376) SI-SDR, (0.969) STOI (short-time objective intelligibility), (12.363) SDR, (9.363) DER, and (97.193) SCA.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Iraqi Journal of Science
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.