Visit Your Local PBS Station PBS Home PBS Home Programs A-Z TV Schedules Watch Video Donate Shop PBS Search PBS
I, Cringely - The Survival of the Nerdiest with Robert X. Cringely
Search I,Cringely:

The Pulpit
The Pulpit

<< [ A Hurricane Named Sinatra ]   |  'Cause Backing-up is Hard to Do  |   [ Go to the Back of the Bus ] >>

Weekly Column

'Cause Backing-up is Hard to Do: Introducing Baxter, a Peer-to-Peer Backup Network

Status: [CLOSED]
By Robert X. Cringely
bob@cringely.com

Twenty-five years ago, I wrote a book using as my word processor a line editor running on an IBM 370/168 mainframe. It was the only computer I had then, so what the heck. Working late one night on my ADM3a dumb terminal, connected to the mainframe over a 300 baud dial-up connection, I pushed the wrong key and sent 8,000 lines -- almost 100,000 words -- into oblivion. It was one of those moments of instant clarity where the accompanying blast of adrenaline seemed to slow time to a crawl. My finger was still on the key when I realized what I'd done. I wondered if I just kept my finger on the key, pressing down, whether I could keep the command from being executed? Of course not. So I hustled across campus to beg someone to recover my file from a backup tape. And while the backup tape existed (or so they said) it was apparently unreadable. My book was gone, and I'd have to start over.

It could have been worse. Lawrence of Arabia left the hand-written manuscript of his book, The Seven Pillars of Wisdom, on a railway platform in London, losing forever 350,000 words that he subsequently wrote again from scratch. At least I had a variety of printouts to scavenge. But the lesson about saving and backing-up was learned forever, reinforced by several hundred hours of work that mistake added to my project.

At the heart of last week's column about Hurricane Frances was the idea of having a backup strategy, which few individuals (and not even that many organizations) have or implement faithfully. But stercus accidit, so having some plan to protect your data from a storm, from your two year-old, or even from yourself is a good idea. My thoughts to this point have concentrated mainly on how to transfer the data to some physical medium and keep it with me, but there are better and more scalable alternatives, at least one of which I believe could make a nice business.

Read on.

One off-hand suggestion I made last week was to get a Google Gmail account, with its one gigabyte of storage, and simply mail yourself any important documents. I am not a Gmail user so I didn't know that someone has already written a Gmail hack to automate just such a process. Richard Jones's Perl script (in this week's links) creates what he calls the Gmail File System, which can mount that one gig as a virtual volume right on your Linux desktop. It sounds from every description like the solution is Linux-specific, but I'm sure it can be made to work with other UNIX variants, especially since Gmail, itself, runs on Apple xServe 1u boxes. Windows compatibility is unknown, but I'm sure someone will solve that soon.

But while it might be easy to use Gmail for offsite backup, I couldn't bring myself to do that just because of the intrusive nature of Gmail. Remember this is a system that is by invitation only, which means that Google can quickly map a social network establishing who knows who. And since Gmail actually analyzes the content of your e-mail and can automatically group it by subject (how creepy is that?), Google not only knows who your friends are, but what do you talk about with those friends.

No thanks.

Still, as Lawrence and I proved, the true cost of lost data can be immense -- far more than most people or organizations realize. A friend of mine who works for a VERY large computer company had the hard disk die on his company notebook computer. He saw that the drive was dying, was very prepared for the data recovery, and followed as quickly as he could the company procedures for backing-up the drive, replacing it, and reloading applications, etc. Even though this guy is a computer professional of real talent, the full recovery took 77 hours of his time, and cost his employer $9,000. For a friggin' disk drive!

The best solution is to make backup automatic, and there are many ways to do that depending on how much you are willing to pay. But I am very cheap, and I also tend to think of solving problems as doing work that has value and might actually be worth something as a business. But backup as a business is hard to do cheaply. Apple, for example, will let you mount up to a 100 megabyte iDrive as part of its .mac Internet service, but that costs $99 per year. Eight dollars per month for 100 megabytes of storage is too darned much. I'd say a better price is $3.95 per month or $39.00 per year for UNLIMITED backup. That's a per-user price, and multi-user discounts would apply.

Here's my idea for a data backup service I call Baxter. This is NOT a virtual drive available on your system, but a virtualized backup system that works transparently and requires some time to restore your data. But when you really need it, Baxter will save your butt.

That $3.95 per month fee covers any amount of storage the user wants, limited only by how much storage they are WILLING TO DONATE TO THE SYSTEM. Think of this as an alternate and quite a bit more sophisticated Napster. First, it is for BACKUP, so recovery has to be slow enough so people won't think of it as another hard drive. Baxter is data insurance and nothing more. It's a RAID system using donated disk space on a wide area network. Your data is compressed, then cut into chunks, and those chunks are distributed to dozens of places with enough forward error correction thrown in to cover any storage that is lost or happens to be down when recovery is needed. The data is both encrypted (on the customer end, so unencrypted data never enters the system and that vulnerability is eliminated) and split into chunks so no one person has enough to make any sense of it even if they could decrypt it. The Baxter business provides client software, handles divvying-up the RAID information, and keeps track of what chunks go where.

Even though it is Napster-like in that it knows where all the chunks are, Baxter doesn't know what the chunks are, nor is the end-user in a position to use it as a Napster-like system for music sharing, since data recovery is deliberately slow and restorations could be limited in number or restricted to specific IP ranges or MAC addresses. Baxter's administrative cost is only $3.95 per month, and there is a great incentive for people to stay in the system, because if they drop out (or stop paying), they lose access to their own data. Besides, what's $3.95 except A LOT OF MONEY for a two to three-man company? The main costs are software development, marketing, and buying enough storage to start with so you can offer the full service right from the first minute. That extra storage cost will go down over time. And what about the concern that you'll run out of disk space, especially since the redundant data required for RAID will effectively expand the total data size? It won't happen. That's because compression will probably cover the data expansion, and anyone who enters the system will have to offer-up their storage before they get to use any. This opens up large chunks of available storage, yet users probably won't fill their total quota right away, if ever. You can't give 10 gigs and use 20, but you CAN give 10 gigs and use eight and it might take a month or a year to even use that eight.

This is not to say that offering a service like Baxter would be easy. There are plenty of problems synchronizing data over unreliable connections without throwing-in all these Baxterrific features. But a small group of smart people could do it easily and cheaply, providing a very valuable service to the world at the same time. And if anyone actually does this, I'm hoping for a free account.

Comments from the Tribe

Status: [CLOSED] read all comments (0)