We react to computer voices in the same way we react to human ones
Cliff Nass explains that when the face on a computer screen and the voice we hear appear mismatched, we automatically distrust it
Inside a Conversational Computer
Each interaction with a hybrid language-processing system involves a series of steps, from understanding speech to generating a response
Computer Speech Technology
is on the Move
At the Massachusetts Institute of Technology, researchers and students in the Computer Science and Artificial Intelligence Laboratory have been developing ways to help people and computers have more meaningful dialogue.
As computers increasingly permeate our daily lives, our demand for online information is skyrocketing. Growing numbers of us turn to the Internet to catch up on the latest news, sports, and weather, obtain stock quotes, reserve airline flights, conduct research, or check out what's playing at local theaters. Unfortunately, navigating through vast amounts of data to obtain useful information can require a time-consuming series of keyboard entries and mouse clicks, and technical savvy. But there is a more efficient, more flexible tool available for human-computer interaction, something that even the most technically challenged of us could use anywhere, any time: spoken language.
To make it possible for humans to speak to computers, we need a conversational interface. A conversational interface enables humans to converse with machines (in much the same way we communicate with one another) in order to create, access, and manage information and to solve problems. It is what Hollywood and every “vision of the future” tells us that we must have. Since 1989, getting computers to communicate the way people do -- by speaking and listening -- has been the objective of the Spoken Language Systems (SLS) Group at MIT’s Computer Science and Artificial Intelligence Laboratory.
How does it work?
Spoken language systems will allow us to have spontaneous conversations with computers
User: Yes, I would like the weather forecast for London, England, please.
JUPITER: In London in England Wednesday, partly cloudy skies with periods of sunshine. High 82 and low 63. Is there something else?
User: What is that in degrees Celsius, please?
JUPITER: In London in England Wednesday, high 28 Celsius and low 17 Celsius. What else?
SLS researchers make this kind of dialogue look easy by empowering the computer to perform five main functions in real time:
Throughout the conversation, the computer also remembers previous exchanges. In this example, JUPITER can respond to “What is that in degrees Celsius, please?” because the user has just asked about weather conditions in London. Otherwise, the system would request the user to clarify the question.
Many speech-based interfaces can be considered conversational. They may be differentiated by the degree with which the system maintains an active role in the conversation, or the complexity of the potential dialogue. At one extreme are system-initiative, or “directed-dialogue” transactions in which the computer takes complete control of the interaction by requiring that the user answer a set of prescribed questions, much as with touch-tone implementation of interactive voice response (IVR) systems.
In the case of air travel planning, for example, a directed-dialogue system could ask the user to “Please say just the departure city.” Because the user's options are severely restricted, it is easier to successfully complete such transactions, and indeed there have been some successful demonstrations and commercial deployment of such systems.
At the other extreme are user-initiative systems in which users have complete freedom in what they say to the system, (e.g., “I want to visit my grandmother”) while the system remains relatively passive, asking only for clarification when necessary. In this case, the user may feel uncertain as to what capabilities exist, and may, as a consequence, stray quite far from the domain of competence of the system, leading to great frustration because nothing is understood.
Lying between these two extremes are systems that incorporate a “mixed-initiative,” goal-oriented dialogue, in which both the user and the computer participate actively to solve a problem interactively using a conversational paradigm. This latter mode of interaction that is the primary focus of our research.
MIT developed the prototype for toll-free directory assistance
Although tremendous progress has been made over the last decade in
developing advanced conversational spoken-language technology, we must
make much additional progress before conversational interfaces approach
the level of naturalness of human-human conversations. Today, SLS
researchers are refining core human-language technologies and
incorporating speech with other kinds of natural input modalities such
as pen and gesture. They are working to: upgrade the efficiency and
naturalness of application-specific conversations, improve new word
detection/learning capability during speech recognition, and increase
the portability of core technologies and develop new applications. As
the SLS Group continues to address these issues, it brings us closer to
the day when anyone, anywhere, any time, can interact easily with
Reprinted courtesy: Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory
William and Flora Hewlett
© COPYRIGHT 2005 MACNEIL/LEHRER PRODUCTIONS. All Rights Reserved.