| | | | | |       Listen to Me       | | | | | |

The Kurzweil Applied Intelligence Alumni Newsletter


Go to: Welcome Table of Contents What's New Registration Database

Wednesday May 16, 12:15 am Eastern Time

All Talk, No Action

By Mark Boslet

W.S. "Ozzie" Osborne is sitting on a Tokyo train, talking aloud to his laptop - enun-ci-at-ing each syl-la-ble. As a traveling executive and one of the brains behind IBM's ViaVoice software, he regularly dictates short e-mail messages and rough drafts of research papers to his computer, relying on his own program to translate spoken words into written text.

But the voice expert with the rock-star name is realistic about the software he helped develop. ViaVoice is a "limited tool," he admits, not yet ready to handle long e-mail messages or documents that need to be perfect the first time around. His advice to ViaVoice users: "Just start talking, and then come back later and clean it up."

That about sums up how far voice recognition software has come - and how far it still has to go. For years, pundits have foreseen a day when we'll toss out our keyboards and talk directly to our computers. Users would be untethered from their PCs. Millions of novices would join the computer revolution, no longer intimidated by complex interfaces. Huge new markets would open up.

But even as Microsoft prepares to release its first speech product, software engineers admit that PC-based programs will probably never be sophisticated enough to replace the keyboard completely. Meanwhile, research efforts are shifting away from the PC. Instead of teaching the computer to understand every word said, technologists are focusing on so-called network applications, accessible from phones and other handheld devices, with limited vocabularies and narrow capabilities.

All speech-recognition programs work essentially the same way. A microphone translates sounds into electrical signals, and the computer turns those signals into a stream of 1s and 0s. The software examines this stream, looking for patterns it can identify as specific sounds. It then searches through a dictionary for words with the same sounds in the same order, assigning each match a probability that it is the correct choice.

That last step is the tricky part. It's not hard for a computer to recognize an unambiguous word like "the." But think how tricky it is to distinguish between "here" and "hear." That demands an understanding of context, which requires intelligence. Today's programs try to be that smart, but they aren't there yet.

They may never get there. The average speech program today boasts a 90 percent accuracy rate out of the box. You can improve that through training, by reading to your computer so the software learns to recognize the idiosyncrasies of your speech. But even then, accuracy can be undermined by ambient noise, cheap microphones and the vagaries of individual speech - imperfect grammar, incomplete sentences, regional dialects and the differences wrought by gender and age. Researchers say these variables make the speech problem intractable.

"If you restrict some of the variables - if, for example, you insist on good microphones and a limited vocabulary - you can take care of the other ones," says Microsoft senior researcher Alex Acero. "It's only when you have no constraints that it's really difficult." While speech products have improved dramatically in recent years, Acero says most of the advances are due to increased computer power, not improvements in the underlying technology.

The future of speech recognition depends on the R&D commitments of a few vendors. With 30 million copies of ViaVoice sold since 1996, IBM's Osborne says Big Blue is "in the business for the long haul," committing what he will only call a "sizable" sum to broaden the software's capabilities.

On the other hand, Lernout & Hauspie (which sells Naturally Speaking) has more pressing concerns: After overstating sales from 1998 through 2000, the Belgian company is in bankruptcy court and plans to sell pieces of its business. Company founders Jo Lernout and Pol Hauspie were recently arrested and charged with fraud and stock manipulation.

Meanwhile, Microsoft is including speech capabilities in Office XP, the latest version of its popular software suite, set to ship later this month. In rolling out its first speech product, Microsoft has been careful to temper expectations. It's targeting three categories of users: slow typists; users with disabilities such as carpal tunnel syndrome; and people communicating in intricate written languages such as Chinese and Japanese that require complicated keyboards and scores of characters. "Speech is still an art," says Microsoft product manager David Jaffe. "You still have to become good at dictating to your computer. If you're a great typist, you're not going to see a big impact."

Despite the glitches, speech recognition software is expected to grow from a $100 million market last year to $2.5 billion by 2005, according to the Giga Information Group. But the market for network voice applications and services will dwarf it, with a potential for sales of $47 billion. These network apps range from United Airlines' voice-activated phone service for flight departure and arrival information, to offerings from Tellme Networks and BeVocal that let users navigate menus of traffic reports, stock quotes and news over the phone. One company, Sound Advantage, is promoting a service that provides access to e-mail and faxes by phone; a server on the network reads the messages to callers.

Speech works better on the network because the system doesn't have to understand as many words. While a desktop program may have a vocabulary of more than 200,000 words, a network app may need only 40,000. The United Airlines flight line, for example, needs only to understand city names and a few other preprogrammed sets of user responses. The smaller the vocabulary, the smaller the database the program must search to find a match, and the less likely it'll run into ambiguities. If a network program doesn't recognize what you've said, it can always ask you to repeat it or kick you over to a human attendant.

Despite the rise of network applications, desktop speech recognition still fascinates researchers. "We've got to change the whole user interface," says Osborne. "That's really where the future is." Speech is just one part of that new interface. Touch and advanced graphics will also play a role - as will a new dose of network smarts. None of these technologies alone will revolutionize the PC. But all of them together just might.

© Yahoo! Inc. All rights reserved.


| | | | | | Kurzweil AI Alumni News | | | | | |
Go to: Welcome Table of Contents What's New Registration Database

Questions or Problems? Send e-mail
May 22, 2001