what lies ahead?
cliff nass describes new computer voice technology

Machine Voices
We react to computer voices in the same way we react to human ones

video The Matched Guise Test
Cliff Nass explains that when the  face on a computer screen and the voice we hear appear mismatched, we automatically distrust it

Inside a Conversational Computer
Each interaction with a hybrid language-processing system involves a series of steps, from understanding speech to generating a response

Additional Resources
Technology Spotlight  Index

Technology Spotlight

Talk to the Screen

Computer Speech Technology
is on the Move

At the Massachusetts Institute of Technology, researchers and students in the Computer Science and Artificial Intelligence Laboratory have been developing ways to help people and computers have more meaningful dialogue.

As computers increasingly permeate our daily lives, our demand for online information is skyrocketing. Growing numbers of us turn to the Internet to catch up on the latest news, sports, and weather, obtain stock quotes, reserve airline flights, conduct research, or check out what's playing at local theaters. Unfortunately, navigating through vast amounts of data to obtain useful information can require a time-consuming series of keyboard entries and mouse clicks, and technical savvy. But there is a more efficient, more flexible tool available for human-computer interaction, something that even the most technically challenged of us could use anywhere, any time: spoken language.

To make it possible for humans to speak to computers, we need a conversational interface. A conversational interface enables humans to converse with machines (in much the same way we communicate with one another) in order to create, access, and manage information and to solve problems. It is what Hollywood and every “vision of the future” tells us that we must have. Since 1989, getting computers to communicate the way people do -- by speaking and listening -- has been the objective of the Spoken Language Systems (SLS) Group at MIT’s Computer Science and Artificial Intelligence Laboratory.

How does it work?

Spoken language systems will allow us to have spontaneous conversations with computers

Imagine talking to a computer to find a needle-in-the haystack job listing, or show times of a movie premiere at the closest theater. Today, obtaining such information online requires a programmed transaction between the user, who clicks through a pre-determined sequence of options and views results, and the computer, which retrieves user-selected data. With spoken language systems, however, user and machine can engage in a spontaneous, interactive conversation, incrementally arriving at the desired information in far fewer steps. A case in point is the following excerpt from a conversation between a user and JUPITER, an SLS-based weather forecast system:


User: Yes, I would like the weather forecast for London, England, please.

JUPITER: In London in England Wednesday, partly cloudy skies with periods of sunshine. High 82 and low 63. Is there something else?

User: What is that in degrees Celsius, please?

JUPITER: In London in England Wednesday, high 28 Celsius and low 17 Celsius. What else?

SLS researchers make this kind of dialogue look easy by empowering the computer to perform five main functions in real time:

  • Speech Recognition -- converting the user’s speech to a text sentence of distinct words;
  • Language Understanding -- breaking down the recognized sentence grammatically and systematically representing its meaning;
  • Information Retrieval -- obtaining targeted data, based on that meaning representation, from the appropriate online source;
  • Language Generation -- building a text sentence that presents the retrieved data in the user's preferred language; and
  • Speech Synthesis -- converting that text sentence into computer-generated speech.

Throughout the conversation, the computer also remembers previous exchanges. In this example, JUPITER can respond to “What is that in degrees Celsius, please?” because the user has just asked about weather conditions in London. Otherwise, the system would request the user to clarify the question.

Many speech-based interfaces can be considered conversational. They may be differentiated by the degree with which the system maintains an active role in the conversation, or the complexity of the potential dialogue. At one extreme are system-initiative, or “directed-dialogue” transactions in which the computer takes complete control of the interaction by requiring that the user answer a set of prescribed questions, much as with touch-tone implementation of interactive voice response (IVR) systems.

In the case of air travel planning, for example, a directed-dialogue system could ask the user to “Please say just the departure city.” Because the user's options are severely restricted, it is easier to successfully complete such transactions, and indeed there have been some successful demonstrations and commercial deployment of such systems.

At the other extreme are user-initiative systems in which users have complete freedom in what they say to the system, (e.g., “I want to visit my grandmother”) while the system remains relatively passive, asking only for clarification when necessary. In this case, the user may feel uncertain as to what capabilities exist, and may, as a consequence, stray quite far from the domain of competence of the system, leading to great frustration because nothing is understood.

Lying between these two extremes are systems that incorporate a “mixed-initiative,” goal-oriented dialogue, in which both the user and the computer participate actively to solve a problem interactively using a conversational paradigm. This latter mode of interaction that is the primary focus of our research.

MIT developed the prototype for toll-free directory assistance

Since 1994, we have developed a conversational architecture called GALAXY that incorporates the necessary human language technologies (speech understanding and generation, discourse and dialogue) to enable advanced research in mixed-initiative interaction. Since then, this open-source architecture has been adopted by many researchers around the world as a framework for conducting research on advanced spoken-dialogue systems. Here at MIT, we have developed many prototype conversational systems, many of which are deployed on toll-free telephone numbers and enable users to access information about (for example) weather forecasts JUPITER, airline scheduling PEGASUS and flight planning MERCURY, Cambridge city locations VOYAGER, and selected Web-based information WebGALAXY.

Raising the Level of Human-Computer Conversation

Although tremendous progress has been made over the last decade in developing advanced conversational spoken-language technology, we must make much additional progress before conversational interfaces approach the level of naturalness of human-human conversations. Today, SLS researchers are refining core human-language technologies and incorporating speech with other kinds of natural input modalities such as pen and gesture. They are working to: upgrade the efficiency and naturalness of application-specific conversations, improve new word detection/learning capability during speech recognition, and increase the portability of core technologies and develop new applications. As the SLS Group continues to address these issues, it brings us closer to the day when anyone, anywhere, any time, can interact easily with computers.

Reprinted courtesy: Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory

Suggested Reading/Additional Resources

Back to Top

Sponsored by:

National Endowment for the Humanities Hewlett Foundation Ford Foundation   Arthur Vining Davis Foundations Carnegie Corporation

National Endowment
for the Humanities

William and Flora Hewlett


Rosalind P.

Arthur Vining
Davis Foundations

Corporation of New York