Skip to content Northern Virginia Resource Center for Deaf & Hard of Hearing Persons

Researchers Revolutionize Closed Captioning

By Lisa Zyga, 3/22/2012

Ever since closed video captioning was developed in the 1970s, it hasn't changed much. The words spoken by the characters or narrators scroll along at the bottom of the screen, enabling hearing impaired viewers - or all viewers when the sound is off - to follow along.

Now a team of researchers from China and Singapore has developed a new closed captioning approach in which the text appears in translucent talk bubbles next to the speaker. The new approach offers several advantages for improving the viewing experience for the more than 66 million people around the world who have hearing impairments.

The researchers, Meng Wang from the Hefei University of Technology in China and colleagues, won the Best Paper Award for their work on the new closed captioning method from the Association of Computing Machinery (ACM) Multimedia Conference in October 2010.

“The whole technique was motivated by solving the difficulties of hearing-impaired viewers in watching videos,” Wang told “These viewers have difficulty in recognizing who is speaking, so we put scripts around the speaker's face; they have difficulty in tracking scripts, so we synchronously highlight the scripts.”

As the researchers explain, conventional closed captioning can be considered static captioning, since all spoken words are represented in the same way at the bottom of the screen, regardless of who said them or the vocal dynamics. In contrast, the researchers describe their new technique as dynamic captioning, since the text appears in different locations and styles to better reflect the speaker's identity and vocal dynamics. For example, the text is highlighted word by word in synchrony with the speech signals. In addition, a small indicator next to the talk bubble shows the variation of vocal volume.

Moreover, all of these features can be automatically implemented without any manual intervention. The engineers developed algorithms to automatically identify the speaker using the video's script file along with lip motion detection. Using a technique called visual saliency analysis, the technology can automatically find an optimal position for the talk bubble so that it interferes minimally with the visual scene. Professionals can also further adjust the generated captions, such as moving the talk bubbles. When the speaker is off-screen, or a narrator is speaking, the words appear at the bottom of the screen as in static closed captioning. The system estimates vocal volume of words and phrases by computing the power of the audio signal in 30-millisecond windows.

See the full article at