Punctuation Prediction for Audio Speech Transcripts
Abstract: Transcripts generated from audio or video files requireproper punctuation to be readable. In this paper, we investigate the use of different deep learning architectures forautomated punctuation of transcripts. Specifically, we construct five different architectures which are trained on punctuated transcripts sourced from TED talks. Each architecture is explored in detail; we highlight the benefits and deficiencies of each one by analyzing each architecture’s performance in punctuation prediction with three chosen punc-tuation marks. Furthermore, we analyze the effects of the data utilized on the performance of the neural network architectures.
This was the final project to Georgia Tech’s Deep Learning Course (CS7643) during the Spring semester of 2020. The final .pdf write-up of the project can be read here.