A Method to Real-Time Update Speaker Pronunciation Time-Database for the Application of Informatized Caption Enhancement by IBM Watson API

Yong Sik Choi, In Hwan Kim, Hyun Mo Yang, Dong Woo Lim, Ailing Lin, Jin Woo Jung

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

One of the major AI research fields is natural language processing by speech recognition. IBM Watson is one of the representative tools for this speech recognition system which can automatically generate not only the recognized words from voice signal but also the speaker ID and timing information of each words including the starting time and the ending time. However, IBM Watson is not enough good and easily generate incorrect recognition output when there are some noise in the audio signal, especially for movies where background music and special sound effects are incorporated together. There were some studies to solve this problem using the IBM Watson API based on the assumption that speaker pronunciation time DB was already implemented properly. But, it is not easy to make speaker pronunciation time DB and it requires big cost. In this paper, to resolve this problem of speaker pronunciation time DB, we introduce an efficient method to implement and update the speaker pronunciation time DB in real time.

Original languageEnglish
Title of host publicationFrontier Computing - Theory, Technologies and Applications FC 2018
EditorsLin Hui, Jason C. Hung, Neil Y. Yen
PublisherSpringer Verlag
Pages490-495
Number of pages6
ISBN (Print)9789811336478
DOIs
StatePublished - 2019
Event6th International Conference on Frontier Computing, FC 2018 - Kuala Lumpur, Malaysia
Duration: 3 Jul 20186 Jul 2018

Publication series

NameLecture Notes in Electrical Engineering
Volume542
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference6th International Conference on Frontier Computing, FC 2018
Country/TerritoryMalaysia
CityKuala Lumpur
Period3/07/186/07/18

Keywords

  • IBM Watson API
  • Informatized caption
  • Speaker pronunciation time-DB
  • Speech recognition

Fingerprint

Dive into the research topics of 'A Method to Real-Time Update Speaker Pronunciation Time-Database for the Application of Informatized Caption Enhancement by IBM Watson API'. Together they form a unique fingerprint.

Cite this