Visit Your Local PBS Station PBS Home PBS Home Programs A-Z TV Schedules Watch Video Donate Shop PBS Search PBS
I, Cringely - The Survival of the Nerdiest with Robert X. Cringely
Search I,Cringely:

The Pulpit
The Pulpit

<< [ Out of Sight, Out of Mind ]   |  TV Oaxaca  |   [ Take Two and Call Me in the Morning ] >>

Weekly Column

TV Oaxaca: How Narrowband Streaming Video Could Serve 90 Million Stranded Americans

Status: [CLOSED]
By Robert X. Cringely
bob@cringely.com

There are competing technologies and enabling technologies, and the difference between these terms is crucial. When the first PostScript printers came along almost 20 years ago, they enabled a digital graphics industry -- an industry where today many technologies compete. But without PostScript to show what was possible, many of those competing technologies might never have been invented, and the graphics industry would probably not be as rich as it is today. Last week's column about the intrinsic value of the circuit switched telephone network was really about enabling technology, but many readers didn't see it that way. They saw it as just silly.

"The circuit-switched network isn't really circuit-switched anymore and hasn't been for years," was one common criticism of my idea that with the advent of an enabling technology the old phone system might actually be a superior distribution system for video on demand. The example most readers used was making a call on Mothers' Day: If circuit switching is so great, why can't I get through to Mom? If Mom lived across the street or across town, you could reach her. The fact that you can't reach her on Mothers' Day usually comes down to limitations of the long distance network, not the local switched system. And yes, it still is a local switched system as long as that pair of wires going into your house doesn't go into my house, too.

But most complaints had to do with my argument that significant bandwidth savings could be found if video encoding systems were based less on the model of file transfer and more on the model of how we actually see. "If the eye looks at only a tiny part of the scene," went the reader argument, "sure you can get by with small bandwidth for one viewer, but what if more than one person is watching?"

We can track the eye, but it is better to be able to predict in advance where the viewer will gaze and encode only that data. It can be done.

There is a company in Berkeley, California called Viasense and it is the commercial manifestation of more than 25 years of research on sight at the UC Berkeley Department of Neurophysiology. Viasense is a startup built literally on salamander eyeballs. I first met the founders two years ago when their goal seemed to be to replace the MPEG video compression architecture that's at the heart of DVDs and satellite and digital TV. Well, that hasn't happened yet, but the company has learned a lot. They learned, for example, that it is hard to get the world to throw away stuff they've already bought and that still works. Being twice as good isn't enough; if you really want people to trash their current equipment you need to be 10 times as good. Viasense wasn't 10 times as good, so we not only didn't throw away our equipment, most of us didn't even know it was an option to do so.

The older and wiser Viasense of today is taking the more pragmatic approach of making current technologies better, not trying to compete with them. Help me help you.

What Viasense has to sell is something they call Integrated Perceptual Guide, or IPeG. Though IPeG is a video encoder in its own right, what it mostly does is help select WHAT to encode, because encoding information that isn't needed or used is just a waste. Video compression technologies typically deal with spatial, frequency domain, and temporal data, but IPeG adds to this the concept of perceptual priority.

Here's the gist of what the Viasense founders learned from all those salamander eyeballs. The retina, whether it is in a salamander or a South Carolinian, gathers visual data, encodes that data, then sends it over the optic nerve to the brain's visual cortex. Since the optic nerve is both a slow road and bumpy, and since we rely on our eyesight all the time to keep us from being killed, what vision comes down to is as much the avoidance of error as it is the acceptance of image. So the retina makes an estimate of a visual scene or image based upon evolutionary knowledge of the statistical structure of natural scenes. The retina then estimates the likely error in that original estimate. Each of these functions is embodied in a specific segment of the retinal architecture. The retina then transmits to the rest of the brain what can be described as a real-time, 2-dimensional map of the likely error or uncertainty of the original estimate.

What blows me away, if I am understanding this correctly, is that what we "see" isn't the scene itself so much as an error map of the scene. We map the cliffs and potholes then paint the rest of the scene in our minds from stored image data.

Weird, eh?

Think of it this way. You are crossing on foot a busy street. Cars weighing 4,000 lbs. are streaming past you at 30 mph as you stand on the curb. You have to pick a moment to step into traffic and cross the street and that moment must be chosen primarily on the basis of visual information. What's your perceptual priority at that exact moment? Is it deciding whether a car is mint green or sea green or kelly green? Or is your perceptual priority deciding whether the car is likely to hit you if you step off the curb? Of course, it is the latter, and that's why we place great visual priority on aspects like movement and edges of images.

This biological process is duplicated in IPeG technology. Prepare for technospeak. The IPeG signal, or map of uncertainty, describes the perceptual significance of pixels within a scene. Pixels for which the uncertainty value is high, whether in the temporal and/or spatial domain, are also likely to be of high perceptual significance. The IPeG signal permits prioritization of pixels in a scene according to statistical ranking of their perceptual significance. Quantizing the IPeG signal in the spatial domain (intra-frame) results in the natural and automatic segmentation of object regions within an image. Used in the temporal domain (inter-frame), IPeG can be used to identify important changes to objects that occur within a scene.

This isn't video compression. We haven't made it to that part yet. It is just throwing away parts of the scene that exist, sure, but make no important difference to the way we perceive it. And the savings are profound. IPeG perceptual priority data averages about one percent of the total information in a scene. Add another two to four percent for chrominance and luminance data to make the images look pretty, and you get a total of about five percent or 20-to-1 effective compression before we even start compressing.

Of course, IPeG has to function in a world where we already think we have pretty good video compression, so Viasense has built IPeG plug-ins for popular codecs like MPEG-4 and H.264. For broadband content, adding IPeG cuts bandwidth requirements by about 40 percent, meaning a 300 kbps video stream will only require 180 kbps or a 300 kbps stream can look as good as a 500 kbps stream without IpeG.

Apple announced its H.264 codec for QuickTime this week promising HDTV resolution in six megabits-per-second. With IPeG that would be 3.6 mbps.

But where IPeG really shines is for low-bandwidth connections like mobile phones and, yes, the old circuit-switched phone system. This makes sense because as resolutions diminish the importance of error detection has to go up. So for narrowband connections IPeG cuts bandwidth requirements by about 90 percent. That suggests that whatever video you could receive today over your V.92 modem (53 kbps maximum throughput) would require only 6 kbps in IPeG form. And if you could actually get 53 kbps over your dial-up connection, IPeG could make possible a picture that would require up to 450 kbps using current technology.

But remember this IS current technology since it is built as a plug-in with the IPeG decoder module requiring only 30K.

Now imagine a media business built on these ideas. America has a huge Spanish-speaking population but only a few Spanish language cable channels, all of which get their programming from Mexico City and Caracas. But if I'm from Oaxaca, heck I want to watch TV from Oaxaca, and if the picture is a little smaller or a little slower I'll still be happy because it serves my needs better than what I am getting now. Here's where IPeG finally does qualify as an enabling technology.

There are about 90 million American residents for whom English is not the language they speak at home and most of those are ill-served by current TV programming. Programming in their language exists, it just isn't here...yet.

This is a huge little opportunity. It is huge in that 90 million people are a lot of people and little in that for the most part these language minorities are too broadly distributed to be a market reachable by anything except a telephone system. And with narrowband video-on-demand combined with V.92 modem technology, Grandma can even pause her favorite soap opera to take a call.

Comments from the Tribe

Status: [CLOSED] read all comments (0)