Visit Your Local PBS Station PBS Home PBS Home Programs A-Z TV Schedules Support PBS Shop PBS Search PBS
I, Cringely - The Survival of the Nerdiest with Robert X. Cringely
Search I,Cringely:

The Pulpit
The Pulpit

<< [ Data, Know Thyself ]   |  Data, Know Thyself Part 2  |   [ Cargo Cult ] >>

Weekly Column

Data, Know Thyself Part 2: Four Reasons Why XML Will Probably Not Meet Our Expectations

Status: [CLOSED]
By Robert X. Cringely
bob@cringely.com

Last week's column was about the power and potential of XML, the technology that a lot of companies are counting on to drive the next generation of computing. Among those companies is Microsoft, which has a $2 billion bet on XML, a technology it doesn't even own. Remember, XML is a data description language that will allow every application to be web-enabled and will create for each of us a dynamic web giving us exactly the information we are looking for. Not likely, said dozens of hardened cynical techies readers. So this week's column is part two of the XML story. It is not about the potential of XML, but about its probable reality as a designed-by-committee protocol that is being introduced into an imperfect world. Every technological breakthrough begins with hype and follows on with a certain amount of tears as we actually try to make it work. XML is likely to be no different.

The first problem with XML lies at the very heart of the metadata concept. Remember, metadata is a description of the data that accompanies the data in XML. A lot of the cleverness in XML applications comes from analyzing metadata, then delivering the appropriate information based on that metadata description. But what if the metadata author is a liar? I've built a career explaining how engineers can't lie or things wouldn't work the way they should. But engineers aren't likely to be the ones generating most of that metadata. There are very strong reasons why a metadata author would want to lie if doing so directed more readers to his data. If we actually rely on metadata alone for our XML searches, then www.bighooters.com is likely to be at the top of every search result no matter what the search is about.

This means there will probably be more need, not less, for traditional text searching capabilities if just to validate the metadata. Think Microsoft and Sun have a way around this? Think again.

Then there is the deeper technical problem of how to generate all that metadata. This is problem number two with XML. If we rely on people to do the metadata generation, we'll be stuck in that 1930s analysis of the telephone company that concluded that eventually half of all Americans would end up working as telephone operators so the other half could order pizza. Manual systems like these don't scale well. For the phone company, the answer was direct dialing, and for XML it is the automatic generation of metadata. But such automatic generation is much harder than we generally think. It's easy to parse the data and pick out proper names and keywords, but the relationships of those keywords — their ontology — is beyond the ability of most programmers, much less their programs. This is the very problem that killed Artificial Intelligence in the 1980s. So while there is a tendency in the XML movement to either ignore the metadata generation problem or to blithely say is will be solved, chances are it won't be solved and we'll be left with both shallow and inconsistent metadata that limits the usefulness of XML.

The third problem isn't with XML at all, but with Microsoft. Given the company's past behavior, can they really be trusted not to screw it up? Planted in the DNA of every Microsoft executive is a desire to own technical standards, and owning standards can only come from making them proprietary. XML is good because it is generally in plain text and therefore readable by anyone and anything, which makes it harder for one outfit to claim ownership. But it can be done by embedding binary data into XML files. All you have to do is put it in a CDATA section, give it the right encoding (UTF-8 works) and voila, you have an XML file with parts that only Microsoft software can read because only they know the proprietary data format! The marketing line would be that it speeds up parsing or something like that, but the real reason would be so that it only works with Microsoft software. You'll recall this is just the sort of move they tried to make with Java until Sun prevailed in court.

The fourth problem with XML doesn't have much to do with either the standard or Microsoft, but with an inherent limitation of the World Wide Web. Though over time the Web has come to include a vast amount of information, it is far from complete, and most of the data is suspect. Who decides, within your organization, what information goes on your web page? In most U.S. companies and government agencies, it is the public relations department. PR departments typically see this function as one of selectively limiting data. They aren't ransacking the RAID farms looking for every stray bit or byte to share with the world. They are looking for only those bits that support the organization's marketing message and/or lead to compliance with the law. In any organization you can think of, the best stuff — the really juicy stuff — never makes it to the web page. So to think of the World Wide Web as the sum of human knowledge or even a tiny window on that corpus is laughable. This reality limits the usefulness of XML.

It goes on and on. XML has terrific potential, most of which is unlikely to be realized because of complexity, awkwardness, and the inherent unreliability of the institutions upon which XML will be applied. But this doesn't mean XML is not worth doing, just that the dividends from Ballmer's XML Revolution are likely to be modest. It will work best in tightly-defined and constrained applications where it is in the interest of all parties for the system to work. One good area to apply this is Electronic Data Interchange, or EDI.

Remember EDI? It was (is) a system for companies on any supply chain to exchange data about that supply chain no matter what databases or networks the individual participants were running. EDI was expensive and complex and never really worked very well despite the investment of billions of dollars. When companies started doing business over the internet, EDI became less relevant. But it is still around. XML will either make EDI viable on a grand scale or kill it. XML and the Internet eliminate many of EDI's problems. It will be interesting to see what happens.

XML has the ability to be the bridge between different applications with different companies. With it, Airline A can interact with Airline B's reservation system. Yes, I know most airlines already have this ability, but XML is a more elegant and extensible way of doing it. I know of an XML bridge that links help desks between two companies. A ticket to the network queue is transferred to another company's ticket system. When that ticket is closed, the closed transaction is replicated back to the originating ticketing system. This bridge concept is being expanded to handle database transactions. Suppose two companies want to partner or merge. One uses Oracle, the other something else. With XML you can actually bridge the two and make them appear to work as one. But given the extra layer of processing, this use of XML is just a stopgap until the databases can be actually unified.

So there are lots of good uses for XML, but those uses are either narrow or shallow. For the four reasons mentioned above, XML — like most other grand and complex technical initiatives — is likely to be somewhat of a disappointment.

Comments from the Tribe

Status: [CLOSED] read all comments (0)