Visit Your Local PBS Station PBS Home PBS Home Programs A-Z TV Schedules Watch Video Donate Shop PBS Search PBS
I, Cringely - The Survival of the Nerdiest with Robert X. Cringely
Search I,Cringely:

The Pulpit
The Pulpit

<< [ Internet Radio is Dead. Long Live Internet Radio! ]   |  In the Eye of the Beholder  |   [ Full Speed Ahead! ] >>

Weekly Column

In the Eye of the Beholder: Why the Best Device for Streaming Video Might Be a Mobile Phone

Status: [CLOSED]
By Robert X. Cringely
bob@cringely.com

When I was a kid in the 1950s, someone invented these little tables that would allow people to eat dinner while they watched television. Today, we would just put a television in the kitchen, but back, then TVs were expensive and deserved shrine status. So we came, bearing salisbury steaks, to worship at that altar. Television was a group experience even if eating on those little tables was strictly an individual one. But now it occurs to me that we are on the verge of a role reversal, making television viewing an individual experience even for those who are eating together. It may be the best way yet to view streaming video, and the only way to make streaming video a profitable business.

I have written several times before about the dismal economics of streaming video. The culprit is bandwidth. Streams generally have to be duplicated, one for every viewer or listener, so there are almost no economies of scale. The bandwidth required to send a good picture costs so much that streaming video AS WE KNOW IT is generally not a profitable business except, perhaps, for movie trailers and some audio.

Of course, all this will change when bandwidth becomes cheap as dirt — something we have been expecting pretty much any day now for the past four to five years. But with Enron and Global Crossing in bankruptcy, suddenly the advent of almost free bandwidth is further away than ever. If streaming video is to ever be a mainstream success, it needs some kind of boost, some improvement that will quickly make it substantially better than it is today.

This is a lot to ask, because improvements in video compression have come slowly and at this point, are fairly dependent on processor speed. So doubling performance will require a doubling of processor speed, which we all know is 18 months away. It would be great not to have to wait that long. Even better would be to come up with a 4X performance increase — about what it would naturally take three years to accomplish through the kindness of Moore's Law — virtually overnight. Fortunately, such an improvement is possible.

Such an improvement isn't going to come through changing codecs, per se. Every compression/decompression application has its own strengths and weaknesses, and it takes a fair amount of time for the new software to trickle down from the labs, then fight its way back up through the standards committees. And at lower bandwidth levels like dial-up modems and even some DSL, every codec struggles.

Here is my myopic view of the video compression landscape. The emerging big kahuna is MPEG-4, which is based primarily on the H.263 video conferencing codec. Like all MPEG codecs, it uses key frames, which are complete frames transmitted every second or so with interim frames derived by the codec from picture elements held in buffer memory and using a motion estimation algorithm. This is what makes MPEG codecs considered "non-editable," because not all frames can be reconstructed, just estimated. For editable content, you need something like Motion JPEG, which compresses each frame individually. H.263 added a second type of interim frame that could be said to be between a key frame and a derived frame, with the result that performance is quicker. MPEG-4 is best for rapidly changing content. But there are three problems with MPEG-4. First, it doesn't do well at really low datarates. Second, the key frames look sharp, but the interim frames don't, so you get a sharp-fuzzy-sharp-fuzzy-sharp-fuzzy thing going. This is sometimes referred to as "frame snap." And third, when the MPEG-4 frame fails, it is subject to DCT block artifacts.

All codecs based on discrete cosine transforms (DCT), which is to say all the MPEGs, H.261, H.263, and Real will have blocky artifacts. Non-DCT codecs like the Sorenson wavelet codec used in Quicktime (http://www.sorenson.com), the Matching Pursuits codec from UC Berkeley (http://www.truvideo.com), and the low-bandwidth codecs from the University of Strathclyde (http://www.essential-viewing.com) and Bath University (http://www.xiwave.com) all degrade gracefully. You may notice a trend here of university professors starting companies to push their codecs.

But none of these codecs is four times better than the others. My favorite is Avideh Zakhor's Truvideo from Berkeley, but even there, we are talking about perhaps a 10 to 15 percent improvement in bandwidth efficiency, not 400 percent.

In order to get a 400 percent improvement, we need to take a completely new look at the viewing experience. It isn't so much necessary to change how we compress video as how we approach the business of viewing it. In this case, what I am proposing (well, not proposing, actually, since I couldn't do any of this stuff to save my life; rather I have found the work of others that I think should be widely adopted) is to take a closer look at the human process of vision and use this to save bandwidth. We've done this before. Many codecs take advantage of the fact that human sight is much more sensitive to levels of luminance (brightness) than chrominance (color). These codecs, which include every one mentioned so far in this column, save bits by encoding luminance at a higher resolution than chrominance. Typically, each pixel is encoded for luminance, but chrominance is averaged for groups of four pixels, saving about half the bandwidth required to transmit the full-resolution image.

So let's do this again. Let's take advantage of another peculiarity in the way sight works in order to save even more bandwidth. The process we'll abuse this time is called foveation. The fovea is that point in a scene where your attention is actually focused. It turns out that we humans may think we are looking at the big picture, but the truth is that we really concentrate our gaze in a tight area, like the face of the person we are speaking with or the innards of the watch we are trying to fix. The fovea is quite a small area and occupies hardly any real estate in, say, a TV or movie screen. We move the fovea around, following the action in the scene, looking at the faces of characters as they speak. And of course, ogling the women. It is not that we don't see the rest of the picture, but we see those parts at a much lower resolution. Rapid motion in some other part of the scene can cause us to move the fovea over to take a look, but without foveation, there is no real detail in the other parts of the scene.

We could apply this technique to video compression by devoting greater amounts of bandwidth to the fovea and lesser amounts to the rest of the scene. Codecs like wavelets, which can derive video streams at multiple resolutions from the same data set at the server, are ideal for this. But the phenomenon is not limited to wavelets. There are DCT codecs that have been modified for foveation too, and quite successfully.

What you get with foveating codecs is something quite remarkable. Resolution is maximized where the viewer is gazing and minimized everywhere else, with an overall reduction in required bandwidth of three to five times. THERE's our 400 percent bandwidth improvement. Suddenly, what used to take a megabit takes 250 kilobits, and what formerly required ISDN or low-rate DSL can be viewed using a V.90 modem.

But such an improvement doesn't come for free. First, we have to find a way to track user eyeballs since the only way for the codec to work is if we know where people are actually looking. Lasers are good for this, but expensive. This is the area where the greatest amount of work has to be done to make the system practical. Second, foveation is by definition an individual viewing experience, since more than one viewer might watch different parts of the scene at the same time. This brings us back to those TV trays and eating alone. This only works if there is a one-to-one ratio of viewers to displays. We can all sit in the same room, but we have to each be watching our own TV.

What is weird about this whole foveation thing is that, when it is done right, a viewer can't sense that his screen is being messed with. Since the fovea is always in high resolution, the computer or TV screen looks completely normal. But for an innocent bystander trying to look over the shoulder of the intended viewer, the screen is almost unreadable. Hey, that's not a bug, it's a feature! No more people looking over shoulders.

I think this is perfect for mobile phones. Using a mobile phone is, by definition, an individual experience, so there is no lost camaraderie there. And as video comes to phones, it will most likely do so not in the form of larger screens, but through wearable display devices that are much higher resolution. Retinal scan displays are my personal favorites. Microvision is the top company in that business, and what I would like them to do is build a cheap display and find a way to track eye motion at the same time. There is already a laser at work in these devices, so why not use it during the vertical scanning interval to track eye movement? The result would be that mobile phones could become the streaming video devices of choice. Heck, they could become the TVs of the future.

Think about it. In the U.S., Verizon is just now rolling out a new mobile phone data service that runs up to 144 kilobits-per-second. Multiply that by four to cover the foveation improvement, and this is the equivalent of a nearly 600 kilobit-per-second compressed video stream. With any of the codecs mentioned here, that is at least a VHS-quality picture and some would claim DVD-quality. Verizon could grab a hundred or so channels of video, multicast them on its network to owners of these retinal scan video phones that have what would appear to the users to be cinematic displays. As 3G mobile networks are rolled-out, worldwide users could be watching HDTV-quality video while riding the bus. This could be the salvation of those expensive networks. Suddenly, mobile phones could be all Jerry Springer, all the time!

The parts are available right now. Most of the foveation work is being done at the University of Texas at Austin. I have no connection with any of these outfits and certainly no money in them. I don't even know if the people involved are aware that such an improvement is possible. This is just one of those things I want to see happen. Blame my addiction to the Weather Channel.

Comments from the Tribe

Status: [CLOSED] read all comments (0)