Towards Machine Comprehension of Spoken Content

Published in IEEE Transactions on Audio, Speech and Language Processing, 2019

Chia-Hsuan Lee, Hung-yi Lee, Szu-Lin Wu, Chi-Liang Liu, Wei Fang, Juei-Yang Hsu, Bo-Hsiang Tseng[PDF]


@article{lee2019machine,
  title={Machine Comprehension of Spoken Content: TOEFL Listening Test and Spoken SQuAD},
  author={Lee, Chia-Hsuan and Lee, Hung-Yi and Wu, Szu-Lin and Liu, Chi-Liang and Fang, Wei and Hsu, Juei-Yang and Tseng, Bo-Hsiang},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2019},
  publisher={IEEE}
}

##Abstract A user can scan through text easily, but it is not the case for spoken content because they cannot be directly displayed on-screen. As a result, accessing large collections of spoken content is much more difficult and time-consuming than doing so for text content. It would therefore be helpful to develop machines which understand spoken content. In this paper, we propose two new tasks for machine comprehension of spoken content. The first is a listening comprehension test for TOEFL, a challenging academic English examination for English learners whose native languages are not English. We show that the proposed model outperforms the naive approaches and other neural network based models by exploiting the hierarchical structures of natural languages and the selective power of attention mechanism. For the second listening comprehension task –spoken SQuAD– we find that speech recognition errors severely impair machine comprehension; we propose the use of subword units to mitigate the impact of these errors.