Microsoft does text-to-speech with a twist in China

Tuesday, 13 November, 2012

It’s one thing to convert spoken English into Mandarin text, but to output that written Mandarin as speech in the vocal style of the original speaker is something very new. Yet that’s what happened when Microsoft’s Chief Research Officer Rick Rashid spoke in China at the end of last month. At the 7.35 mark in the video clip here, the crowd of 2,000 Chinese academics and students breaks into wild applause as they hear the English spoken by Rashid turning into machine-voiced Mandarin before their eyes and ears. “In a few years,” Rashid tells them, “we hope we’ll be able to break down the language barriers between people.” What Microsoft Research is pioneering is nothing less than a speech-recognition, translation and generation suite.

Behind all this is a neural networking system that reduces word-recognition errors significantly. As a result, Microsoft’s translation engine, Bing Translate, is much better placed to feed intelligible Mandarin text into the speaking machine. The killer app, of course, is the generation of foreign language speech in a voice like that of the speaker’s. Preserving your vocal style in translation means that what you’re saying will be much more obvious to the listener and that discussion in Beijing or Berlin will be all the more productive. Your move, Siri.

Comments are closed.