TY - JOUR
T1 - Real-time Informatized caption enhancement based on speaker pronunciation time database
AU - Choi, Yong Sik
AU - Kang, Jin Gu
AU - Joo, Jong Wha J.
AU - Jung, Jin Woo
N1 - Publisher Copyright:
© 2020, The Author(s).
PY - 2020/12
Y1 - 2020/12
N2 - IBM Watson is one of the representative tools for speech recognition system which can automatically generate not only speech-to-text information but also speaker ID and timing information, which is called as Informatized Caption. However, if there is some noise in the voice signal to the IBM Watson API, the recognition performance is significantly decreased. It can be easily found in movies with background music and special sound effects. This paper aims to improve the inaccuracy problem of current Informatized Captions in noisy environments. In this paper, a method of modifying incorrectly recognized words and a method of enhancing timing accuracy while updating database in real time are suggested based on the original caption and Informatized Caption information. Experimental results shows that the proposed method can give 81.09% timing accuracy for the case of 10 representative animation, horror and action movies.
AB - IBM Watson is one of the representative tools for speech recognition system which can automatically generate not only speech-to-text information but also speaker ID and timing information, which is called as Informatized Caption. However, if there is some noise in the voice signal to the IBM Watson API, the recognition performance is significantly decreased. It can be easily found in movies with background music and special sound effects. This paper aims to improve the inaccuracy problem of current Informatized Captions in noisy environments. In this paper, a method of modifying incorrectly recognized words and a method of enhancing timing accuracy while updating database in real time are suggested based on the original caption and Informatized Caption information. Experimental results shows that the proposed method can give 81.09% timing accuracy for the case of 10 representative animation, horror and action movies.
KW - IBM Watson API
KW - Informatized caption
KW - Speaker pronunciation time
KW - Speech to text translation
UR - http://www.scopus.com/inward/record.url?scp=85090309398&partnerID=8YFLogxK
U2 - 10.1007/s11042-020-09590-2
DO - 10.1007/s11042-020-09590-2
M3 - Article
AN - SCOPUS:85090309398
SN - 1380-7501
VL - 79
SP - 35667
EP - 35688
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 47-48
ER -