Visit Your Local PBS Station PBS Home PBS Home Programs A-Z TV Schedules Watch Video Donate Shop PBS Search PBS
what lies ahead?
baldy, courtesy dr cliff nass, stanford university

videoVirtual Relationships
Watch a discussion about how we react when computers speak to us

videoMatched Guise Test
Cliff Nass explains that when the  face on a computer screen and the voice we hear appear mismatched, we automatically distrust it

Talk to the Screen
Learn the difference between speech recognition and speech understanding

Additional Resources
Technology Spotlight Index

Technology Spotlight

Machine Voices

Synthetic Language
Artificial voices may sound mechanical, but they’re getting better all the time — and people seem to respond to them as if they’re from real people. Cliff Nass explains how old brains respond — exquisitely — to new technologies, as learned in working on new BMWs.
Read Full Article.

Humans are evolved to talk: Speech involves more parts of the brain than any other activity. People with IQ scores as low as 50 or brains as small as 400 grams (one-third the size of a normal human brain) can speak. By the age of 18 months, children start learning a new word every two hours and keep going at that rate through adolescence.

Humans are also evolved to listen. Four days after birth, babies can distinguish their native language from other languages. Humans are so tuned to speech that the right ear (left brain hemisphere) shows a clear advantage in processing native language, while the left ear (right hemisphere) attends to all other sounds.

Language arguably evolved primarily to transmit social information. People rapidly categorize voices in terms of gender, personality, emotion, who is speaking and place of origin from such speech characteristics as pitch, cadence, volume, pitch range and word speed. Each of these categories guides us on whom to like, whom to trust and with whom to do business. Sensitivity to voice and language cues has played a critical role in interpersonal interactions for as long as we have lived in social groups.

Technology is adding a new dimension to language. People routinely use voice input and voice output systems to check airline reservations, order stocks, control cars, play games, dictate text into a word processor and a host of other tasks. Consumers can “converse” with handheld or mobile devices as well as household appliances. Because voice interfaces tap into our highly developed speaking and listening skills, they are intrinsically comfortable, easy-to-use and efficient.

How does our old brain react to new technologies? More than 50 research studies done in my lab and others around the world show that people behave toward and draw conclusions about voice-based technology using the same rules and shortcuts that they normally apply to other people. Technological voices, just like the voices of other people, voice-activate the parts of the brain associated with social interaction.

Voice interfaces turn out to be social interfaces. Here are some findings of interest about technology-based voices, which inform the work of designers and the choices of consumers:

  • Even machine-synthesized voices exhibit personality: Extroverts prefer products described by a voice that seems extroverted, while introverts will buy more after listening to voice that seems introverted.
  • A happy voice in a car will make happy people drive better, while people who are upset drive better with a subdued voice.
  • When a technology doesn’t “understand” what a person said, it should never apologize — modesty makes the system seem dumb.
  • People are more creative and honest when speaking into a wireless microphone than when they speak into a headset microphone.

To draw these conclusions, we start with an appreciation of the evolutionary grounding of speech. To confidently provide a design idea or a psychological principle, we test and refine numerous hypotheses. In the accompanying essay, I’ll talk about the research behind the selection of a voice for use in the BMW Five Series.

Suggested Reading/Additional Resources

  • V. Zue and J. Glass,“Conversational Interfaces: Advances and Challenges.” Proceedings of the IEEE, Special Issue on Spoken Language Processing, Vol. 88, August 2000. (PDF)
  • J. Glass and S. Seneff,“Flexible and Personalizable Mixed-Initiative Dialogue Systems,” presented at HLT-NAACL 2003 Workshop on Research Directions in Dialogue Processing, Edmonton, Canada, May 2003. (PDF)
  • V. Zue, et al.,“JUPITER: A Telephone-Based Conversational Interface for Weather Information,” IEEE Transactions on Speech and Audio Processing, Vol. 8 , No. 1, January 2000. (PDF)
  • A list of links provided by the American Association for Artificial Intelligence [AAAI] for students, teachers, journalists, and everyone who would like to learn about what artificial intelligence is, and what AI scientists do.
  • Speech Recognition vs. Speech Understanding   From MIT Press, a comprehensive overview  describing the various components of computer speech understanding technologies.

  • Cliff Nass is a Professor at Stanford University. His primary appointment is in the Department of Communication. He also has appointments by courtesy in Computer Science, Science, Technology and Society, Sociology and Symbolic Systems (cognitive science). Dr. Nass is a director of the Social Responses to Communication Technologies (SRCT) Project at Stanford University. Recent research in this area includes how people apply social rules and heuristics to ubiquitous technologies in support of office, meeting, and classroom interaction; alignment in natural language understanding; mixed agent and avatar systems for learning; voice interfaces and driving (particularly drowsy driving); and human-robot interaction. Dr. Nass is also one of two Directors of the Kozmetsky Global Collaboratory (KGC) at Stanford University and its Real-time Venture Design Laboratory (ReVeL). Recent research in this area includes: links between personal identity and venture sustainability; information technology and development; and compression in small groups. Dr. Nass is the author of: The Media Equation: How People Treat Computers, Televisions, and New Media Like Real People and Places (New York: Cambridge University Press) - and an upcoming book: Voice Activated: The Psychology and Design of Interfaces that Talk and Listen.

    Sponsored by:

    National Endowment for the Humanities Hewlett Foundation Ford Foundation   Arthur Vining Davis Foundations Carnegie Corporation

    National Endowment
    for the Humanities

    William and Flora Hewlett
    Foundation

    Ford
    Foundation

    Rosalind P.
    Walter

    Arthur Vining
    Davis Foundations

    Carnegie
    Corporation of New York