Getting Computers to See

  • Posted 02.09.11
  • NOVA

Building vision into computers has been a greater hurdle than AI experts ever thought it would be, and they haven't cleared it yet. Simply recognizing everyday objects like a shoe or a chair—an ability than is simple for even a two-year-old child—remains difficult if not impossible for even the most sophisticated computers. As this video short reveals, before we can program them to do see like we do, we need to decode how our brain sees. With that in mind, AI experts are beginning to make some significant advances.

Running Time: 03:43


Getting Computers to See

Posted: February 9, 2011

GAME SHOW HOST: What do you say we play Jeopardy!?

NARRATOR: Watson is an intelligent machine, and a whiz at Jeopardy!.


WATSON: What is Jericho?


WATSON: $400, same category...

NARRATOR: But his knowledge of the world comes only from the words he processes, alone in this room at IBM. In order to build a true artificial intelligence, one that comes close to matching our own, computers will have to learn from what they see in the world. And it turns out this is more challenging than anyone ever expected.

RODNEY BROOKS: The biggest disappointment to me is how hard it has been to build general-purpose vision.

RAJIT RAO: Vision is utterly subconscious. You open your eyes and the world is there. Almost half of our brain is working to make it easy for us.

NARRATOR: And our eyes are only the start. What we actually see is determined by what we know.

PATRICK WINSTON: We sometimes say that vision is a kind of controlled hallucination.

ALEXEI EFROS: It's not just the pixels. A large part of it is defined by our previous experience. Our visual memory. The example I always use is the Monet paintings. So you have this train station. You have a train extending into the scene. And then when you look close, you realize that there is basically nothing there, some splotch of paint. And yet we all see a train because of the previous experience that we've had.

NARRATOR: Until scientists can duplicate how humans see, they are training computers to see by example. Millions of examples. Thanks to the seemingly endless samples of digital images on the Internet.

Hartmut Neven helped create one of the best computer vision programs available, Google Goggles.

HARTMUT NEVEN: There's sort of a laundry list of things it can recognize at this point.

NARRATOR: Goggles starts by looking for features in an image—points, angles, and pixels—and compares those features to hundreds of possible look-alikes it identifies in its databases.

HARTMUT NEVEN: Any incoming picture is analyzed, and the most salient features of those are compared against similar features extracted from the database images.

NARRATOR: Because it was trained to hunt for features among millions of different images, it now uses statistics to designate the most likely match. It may not look anything like human vision, but computers are starting to see the world in their own way.

HARTMUT NEVEN: 'Cause there's probably a couple billion different objects that Google Goggles can recognize at this point.

NARRATOR: Including more paintings than a human ever could, with surprising accuracy. But it's not close to perfect. Give it a three-dimensional object without sharp edges, and it's lost.

HARTMUT NEVEN: A little dog. [laughs]

RODNEY BROOKS: The stuff that anyone could do, a two-year-old could do, like recognize a shoe, or a chair...

HARTMUT NEVEN: The chair is yellow, I get things like sunsets and even a yellow fish.

RODNEY BROOKS: It turns out that's the really hard stuff, that even with today's machine learning, we can't do well yet.

HARTMUT NEVEN: Once we have a perfect vision system, we will have perfect A.I.

NARRATOR: But perfect vision may not come until scientists decode how our brain sees.

RAJIT RAO: What I would like is to go out at some point with a little robot or a computer which is looking at the world much in the same way as a kid, and come back learning these things with minimal, minimal supervision.


Produced for NOVA by
Michael Bicks
Field Produced by
Sharon Kay
Edited by
William A. Anderson and Daniel Gaucher
Senior Produced by
Julia Cort
Narrated by
John McKenzie
Animation by
Ekin Akalin, Tuna Bora, Mark Schornak, Michael Hickman, and Kristen Larson
Graphics by
Pam Benjatanaporn
Photographs courtesy of
Jason Freidenfelds, Colin Guillas, David Haller, Ryan Harvey, Chester Kay Shenghung Lin, Yiannis Logiotatides, and Mike Wiacek

Related Links