Research Seminar on AI: Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition
Dienstag, 19.10.2021, 16.00 Uhr
The current automatic speech recognition (ASR) filed shows two major competing trends: classical vs end-to-end approaches. The latter allows a direct mapping of acoustic feature sequence to (sub)word sequence, which achieves great simplicity and state-of-the-art performance. While subword units are the most common label units for end-to-end ASR, a fully acoustic-oriented subword modeling approach is somewhat missing. In this talk, Wei Zhou will introduce his recent approach of acoustic data-driven subword modeling (ADSM) on this aspect. With a fully acoustic-oriented label design and learning process, ADSM produces acoustic-structured subword units and acoustic-matched target sequence for further ASR training. Experimental comparison shows that ADSM can outperform other popular subword units in all three major end-to-end ASR systems.
Wei Zhou is a third-year PhD student at the Human Language Technology and Pattern Recognition Group (head: Sen. Prof. Dr.-Ing. Hermann Ney), RWTH Aachen University. He received his M.Sc. degree in multimedia engineering from University of Erlangen-Nuremberg, and his Diploma degree in information technology from FH Lübeck. Before his PhD, he worked as automatic speech recognition (ASR) software engineer in the industry. His research mainly focuses on ASR including acoustic modeling, search & decoding and end-to-end approaches.