Data, Know Thyself: The Power of XML is Going to Change Everything About Computing. Now If Only I Could Describe It
bob@cringely.com
I've been waiting to comment on Microsoft's .NET business because it seems to me that .NET is far less important than the XML technology that lies beneath it. Same for Sun ONE, the most obvious competitor for .NET. XML is more interesting. But there is a risk in tackling such a technical subject. The risk is that my effort will be unsatisfying to nearly everyone. To the experts who breathe XML, what I write will inevitably look naive and maybe even incorrect. To the lay readers of this column, it might read like gibberish. So I've lain this week in a fetal position and tried to connect emotionally with both groups. What follows is my take on the real meaning of XML. If you think I'm stupid, well maybe so. Just don't tell my Mom.
XML is the new religion at Microsoft. CEO Steve Ballmer calls what is happening "the XML revolution" — a revolution so important that Ballmer is working hard pivoting all of Microsoft to take advantage of it. In three years, he says, XML is going to change completely the way we use computers, and most of Microsoft's competitors see it that way, too. But what the heck is XML, anyway, and why should we care?
XML stands for eXtensible Markup Language, and is just the latest descendant of the General Markup language invented years ago at IBM as a kind of common style sheet for technical reports. The General Markup Language begat the Standard General Markup Language that begat the HyperText Markup Language (HTML) that made possible the World Wide Web. All these predecessors to XML were about describing a page, whether on paper or on a computer screen. Where is the text? Where are the pictures? What fonts are used on what color background? When a web page is written in HTML, it can be rendered (drawn on the computer screen) by any browser application like Microsoft Internet Explorer or Netscape Navigator running on almost any kind of computer, and should look pretty much the same on each, right down to those clever push buttons, check boxes, and radio dials.
What's odd about HTML is that while it does a perfectly good job of describing how a web page will look and function and what words it will contain, HTML has no idea at all what any of those words actually mean. The page could be a baby food ad or plans to build an atomic bomb. While HTML knows a great deal about words, it knows nothing at all about information.
Say we want to find specific information somewhere on the Internet. Most of us would use a search engine. Like every other web page, a search engine like Excite or Google is presented using HTML, but it isn't the HTML that finds us the stuff for which we are looking. HTML is not a programming language. People don't write programs in HTML. Instead, HTML describes how a web page looks and in that capacity it contains the text of that page, but to HTML, words are just words, not information. It is up to the search engine to invoke another program to first parse (read) the HTML, stripping away all the parts having to do with page description and leaving just the text. Then the text has to be identified and read by the search engine program, which is generally looking for key words specified by the user. It's a dumb system that usually only looks for word patterns in the text with little regard for what the text is actually about.
HTML describes web pages but has nothing at all to say about what the page means. It describes fonts and positioning and which parts of the page are hyperlinks and which are text, but the HTML doesn't know whether the page is about fly fishing or pornography. XML, on the other hand, describes data, not pages. It is all about fly fishing or pornography — about the actual information content — but says nothing about layout. You still need HTML for the layout part. So XML is not in any way a replacement for HTML.
The power of XML, then, is that it makes applications aware of what they are about. An XML search engine, for example, wouldn't have to drag back all the text and analyze it for content. It would just send out a message saying "All pages that are about fly fishing, please identify yourselves!" And they would.
XML makes web content intelligent. And by doing so, it enables us to move beyond the current world where we look at the Internet through browsers to a more advanced world where every application is Internet-aware and maybe the browser disappears as a popular application. Once your spreadsheet talks XML, it can link across the Net into other spreadsheets and into server-based applications that offer even greater power.
That's at the heart of Microsoft's .NET (dot-NET) initiative, which puts little XML stub applications on your PC that don't actually do much until they are linked to the big XML servers Microsoft will be running over the Internet. All your office applications become XML-aware, which means you can do powerful things on tiny computers as long as you continue to pay rent to Microsoft. The effect of dot-NET is cooperative computing, but the real intent is to smooth Microsoft's cash flow and make it more deterministic. .NET will move us from being owners to renters of software and will end, Microsoft hopes forever, the tyranny of having to introduce new versions of products just to get more revenue from users. Under .NET, we'll pay over and over not just for the application parts that run on Microsoft computers, not ours, but we'll also pay for data, itself, with Microsoft taking a cut, of course.
Probably the biggest reason why we even care about this stuff is because of the Y2K computer crisis of the late 1990s. Back then, we had hundreds of thousands of mainframe computer programs that didn't make a lick of sense to even the smartest programmer trying to read their code. Their original programmers had retired or died, taking with them any notion of what most of the computer code actually meant. So a whole new batch of programmers had to find ways to assign meaning to what appeared to be gibberish. What made the job so difficult was that the code wasn't readable by humans. Huge databases contained hundreds of millions of entries, but which words were the customer names and which where their account numbers? It wasn't at all clear, and every program was different from every other program.
XML changes all that by introducing the concept of metadata — data about data. In XML, each piece of data not only includes the data itself, but also a description of the data, what it means. Now your XML database can have a list of names (that's the data) and a tag on the data saying that these are customer names (that's the metadata). Should some Y2K-like catastrophe afflict our XML database, it would be easy for any programmer to look at the metadata to reconstruct the database program. In fact the metadata is the program, which is how those fly fishing pages were able to announce themselves in an earlier example.
Once we embrace XML, and nearly the entire computer industry already has, then wondrous things begin to happen. Airline ticket databases suddenly are aware that's what they are. So within the constraints of a vocabulary limited to words like "passenger" and "seat number," finding the cheapest way from here to there becomes a matter of just asking. The query — the question you are trying to answer by analyzing data — becomes the database, itself.
XML is leading to a fundamental shift in the way that people store information — away from the traditional use of documents and files to thinking about a document as a container into which various pieces of data are combined or "poured" as required. This kind of document, which is the typical product of an XML application, may be stored for an extended period of time or it may exist only as long as a person interacts with it.
With XML, where the information resides becomes less important than being able to access it. This leads to cooperative applications where modest client machines like handheld computers or mobile telephones can use powerful servers to accomplish major computing tasks. Imagine a television reporter who shoots video of an event, uploads the video to a server, then edits the video using only a mobile telephone. XML can make that possible.
But wait, there's more. That "X" in XML stands for "extensible," which means this is a language to which you are allowed to add new words. And what those new words can signify is almost unlimited. Now that XML makes data more or less self-aware, it is possible to have that data announce itself as it changes. When a favorite stock price jumps or falls, you'll know, because an XML application tracks that stock which now squeals out its state of being trade-by-trade.
The end result of adding XML throughout the Internet will be a change in web infrastructure. We'll put much more effort into maintaining and updating data and much less effort into presenting it. Today, huge server farms are used to present to users over and over again what can be beautiful but essentially stagnant pages of stale information. Tomorrow your computer, whether it is on your desk of your wrist, will directly query XML data sources to generate dynamically not the web page as its authors want you to see it, but exactly the web page you want to see. And this page will use only up-to-the-moment data — data that you will probably pay for. And there lies the business opportunity, providing these new data services.
Who will lead this new business? Though XML is an open standard, usable by anyone for free, Microsoft is the early favorite for commercialization with its .NET platform. One aspect of Microsoft's XML push that has gone generally unnoticed is the role of Great Plains Software, Microsoft's most recent and second-largest ever acquisition. Buying Great Plains, a maker of financial software for small and medium-sized businesses, has everything to do with .NET. The Great Plains customer base is ideal for Microsoft's new web services.
Going head-to-head with Microsoft, as always, is Sun Microsystems, whose XML platform is called Sun ONE. Like Microsoft, Sun's offering is intended to attract companies that want to provide XML-based data services. But Microsoft and Sun have plenty of competition in this emerging market. All of the leading tech vendors, including IBM, Oracle, Sun and BEA Systems, are pushing their own technologies for individuals and companies interested in writing and running web services.
If there is one company that will benefit no matter what, it is Cisco Systems, because one characteristic of XML is data expansion. Putting data in XML format and including the metadata description inevitably makes data bigger because we are shipping over the line not just the raw data, but also a description of the data. Bigger data slows down the Internet. A slower Internet makes network service providers build bigger network connections.
And newer, bigger network connections always means more work for Cisco.









