Visit Your Local PBS Station PBS Home PBS Home Programs A-Z TV Schedules Watch Video Donate Shop PBS Search PBS
I, Cringely - The Survival of the Nerdiest with Robert X. Cringely
Search I,Cringely:

The Pulpit
The Pulpit

<< [ Leadfoot ]   |  Azul Means (Big) Blue  |   [ Antisocial ] >>

Weekly Column

Azul Means (Big) Blue: There's a new kind of mainframe coming and it isn't from IBM.

Status: [CLOSED] comments (50)
By Robert X. Cringely

In a triumph of PR right up there with suggesting that Intel executives ever badgered Microsoft executives into doing anything, IBM this week introduced a new generation of mainframe computers. The IBM System z10 is smaller, faster, cooler, has more memory, more storage -- more of everything in fact -- and all that is crammed into less of everything than was the case with the z9 machine it replaces. Touted as more of a super-duper virtualization server than traditional big iron, the only problem with the z10 is that every bit of its superior performance can be easily attributed to Moore's Law. The darned thing actually should be faster than it is. There's a mainframe revolution going on all right, but it's not at IBM. The real mainframe revolt is taking place inside your generic Linux box, as well as at an outfit called Azul Systems.

I'm perfectly happy for IBM to introduce a great new mainframe computer. It's just that the 85 percent faster, 85 percent smaller and a little bit cheaper z10 is coming three years after the z9, and Moore's Law says sister machines that far apart ought to be 200 percent faster, not 85 percent -- a fact that IBM managed to ignore while touting the new machine's unsubstantiated equivalence to 1,500 Intel single-processor servers.

Where were the hard questions? Did anyone do the math? The tricked-out z10 that's the supposed equivalent of 1,500 beige boxes costs close to $20 million, which works out to $13,333 per beige box -- hardly a cost savings. Even taking into account the data center space savings, power savings, and possibly (far from guaranteed) savings on administration, the z10 really isn't much of a deal unless you use it for one thing and one thing only -- replacing a z9.

So the newfangled mainframe is really just an oldfangled mainframe after all, which I am sure is comforting for folks who like to buy oldfangled mainframes.

But those sketchily described IBM benchmarks are, themselves, dubious. IBM never fully explains its own benchmarks nor does it even allow others to benchmark IBM mainframe computers. So nobody really knows how fast the z10 is or how many Intel boxes it can replace if those boxes are actually DOING something.

Remember the stories folks like me wrote a few years back about an earlier IBM mainframe running 40,000+ instances of SUSE Linux under VM on one machine? I wonder how many of those 40,000 parallel Linux images were simultaneously running Doom? My guess is none were.

Far more interesting to me is the vastly increasing utility of Linux as what I would consider a mainframe-equivalent operating system, primarily due to the open source OS's newfound skill with multiple threads that goes a long way toward making efficient use of those multi-core processors we all are so excited to buy yet barely use.

As I wrote a few weeks ago in a column on semiconductor voltage leakage of all things, all this multi-core stuff is really about keeping benchmark performance up while keeping clock speeds down so the CPUs don't overheat. Unlike the benchmark programs, most desktop applications still run on a single processor core and have no good way to take efficient advantage of this extra oomph.

But that's changing. Linux used to be especially bad at dealing with multiple program threads for example -- so bad the rule of thumb was it simply wasn't worth even trying under most conditions. But that was with the archaic Linux 2.4 kernel. Now we have Linux 2.6 and a new library called NPTL or Native POSIX Thread Library to change all that.

NPTL has been in the enterprise versions of Red Hat Linux for a while, but now it is here for the rest of us, too. With NPTL, hundreds of thousands of threads on one machine are now very possible. And where it used to be an issue when many threads competed for data structures (think about 1,000 threads all trying to update a hash table), we now have data structures where no thread waits for any other. In fact, if one thread gets swapped out before it's done doing the update, the next thread detects this and helps finish the job.

The upshot is superior performance IF applications are prepared to take advantage.

"My e-mail application runs on a four-core Opteron server," says a techie friend of mine, "but I've seen it have over 4,000 simultaneous connections - 4,000 separate threads (where I'm using "thread" to describe a lightweight process) competing for those four CPU's. And looking at the stats, my CPUs are running under five percent almost all the time. This stuff really has come a long way."

But not nearly as far as Azul Systems has gone in ITS redefinition of the mainframe -- extending further than any other company, as far as I can tell, models for thread management and process concurrency.

Azul makes custom multi-core server appliances. You can buy a 14u Azul box with up to 768 processor cores and 768 gigabytes of memory. The processors are of Azul's own design, at least for now.

But what's a server appliance? In the case of Azul, the appliance is a kind of Java co-processor that sits on the network providing compute assistance to many different Java applications running on many different machines.

Java has always been a great language for writing big apps that can be virtualized across a bunch of processors or machines. But while Java was flexible and elegant, it wasn't always very fast, the biggest problem being processor delays caused by Java's automatic garbage collection routines. Azul handles garbage collection in hardware rather than in software, making it a continuous process that keeps garbage heap sizes down and performance up.

Language geeks used to sit around arguing about the comparative performance of Java with, say, C or C++ and some (maybe I should actually write "Sun") would claim that Java was just as fast as C++. And it was, for everything except getting work done because of intermittent garbage collection delays. Well now Azul -- not just with its custom hardware but also with its unique multi-core Java Virtual Machine -- has made those arguments moot: Java finally IS as fast as C++.

But for that matter there is no reason to believe that Azul's architecture has to be limited to Java, either, and can't be extended to C++, too.

To me what's exciting here is Azul's redefinition of big iron. That z10 box from IBM, for example, can look to the network like 1,500 little servers running a variety of operating systems. That's useful to a point, but not especially flexible. Azul's appliance doesn't replace servers in this sense of substituting one virtualized instance for what might previously have been a discrete hardware device. Instead, Azul ASSISTS existing servers with their Java processing needs with the result that fewer total servers are required.

Servers aren't replaced, they are made unnecessary at a typical ratio of 10-to-one, according to Azul. So what might have required 100 blade servers can be done FASTER (Azul claims 5-50X) with 10 blade servers and an Azul appliance. Now that Azul box is not cheap, costing close to $1,000 per CPU core, but that's comparable to blade server prices and vastly cheaper than mainframe power that isn't nearly as flexible.

And flexibility is what this is all about, because Azul's assistance is provided both transparently and transiently. Java apps don't have to be rewritten to accept assistance from the Azul appliance. If it is visible on the network, the appliance can assist ANY Java app, with that assistance coming in proportion to the amount of help required based on the number of cores available.

Now imagine how this would work in a data center. Unlike a traditional mainframe that would take over from some number of servers, the Azul box would assist EVERY server in the room as needed, so that you might need a big Azul box for every thousand or so servers, with that total number of servers dramatically diminished because of the dynamically shared overhead.

This is simply more efficient computing -- something we don't often see.

There are other concurrency architectures out there like Appistry (which I wrote about back when it was called Tsunami before we unfortunately HAD a Tsunami -- what sort of marketing bad luck is that?). But where Appistry spreads the compute load concurrently across hundreds or thousands of computers, Azul ASSISTS hundreds or thousands of servers or server images with their compute requirements as needed.

Bear Stearns runs its back office with Azul assistance, but many customers use Azul boxes to accelerate their websites.

Since I am not a big company guy who cares very much about what big companies do, what I see exciting about Azul's approach is how it could be applied in the kinds of data centers where I am typically renting either virtual or dedicated servers. If an Azul box were installed on that network, my little app would instantly and mysteriously run up to 50 times faster.


Comments from the Tribe

Status: [CLOSED] read all comments (50)

Every few years, someone resurfaces the idea that mainframes will die or have died because it is less expensive to aggregate a bunch of processors and interconnect them with a network or a "fast" switch. This started with minicomputers in the 1970's and really gained momentum with UNIX and Sun's "The Network is the Computer" campaign of the late 80's early 90's. The emergence of parallel super-computers in the early '90s, like Teradata's business intelligence engine and IBM's Scalable Parallel SP machines in the early to mid '90s, added to its momentum . In the 1990's, "Client Server" computing kept it going. However, in the late nineties, the operational costs of distributed networks grew rapidly as admin, network, software, environmental and outage costs began to supersede the price of hardware. And so the notion of the "mainframe" as a viable solution for consolidated workloads began to re-emerge. Ultimately this led to the current wave of virtualization and consolidation, in which Linux on system z, and z/OS on system z with "specialty engines," participate. At the same time the supercomputers and highly parallel machines got commoditized, leading to blades and Intel/Linux based super computers like "Blue Gene;" and configurations of commodity servers emerged in places like Google.

The twin trends of commodization/parallelization and consolidation/virtualization continue. Neither type of machine configuration will die. The very latest technology will be put into both commodity style Google-esque configurations and mainframes. To characterize the mainframe with the caricature of "old fashioned hardware" that will be swept away by the "modern" notion of distributed, parallel processing is really no different than the caricature of the massively parallel commodity machines as a "solution looking for a problem" that will ultimately be consolidated into more "sensible" classic mainframe-like designs. The reality is that both types of machines will continue to be of great use, and cross pollinate as time goes on. As for z10 being a PR triumph, there is plenty of hype from all corners of the server business, some of which is self-evident in the responses to your article. The very benchmarks that you talk about are a significant part of it. Generally speaking, they are far more parallel than many of the workloads actually run on IT. They are designed to show the advantages of highly parallel, highly distributed configurations. In fact it is fairly easy to see that the marketing agenda of all the vendors who wanted to prove that distributed client-server computing is better than the "old fashioned mainframe computing" is written all over the benchmark definitions. Think about it, the mainframe gets about 1/3 of IBM's vote on both TPC and SPEC. Whose marketing agenda do you think the industry standard benchmarks follow? I hope that your readers are not naive enough to assume that these benchmarks are pure technical works after an objective "truth." They are not.

The end result is that, over time, the many machines (including IBM's Power and x) have all been designed to that benchmark set and the mainframe is not. When the work does not match the parallel paradigm, the benchmarks fail to represent reality; and, their ability to show relative capacity falls apart.. Since the mainframe is designed for the more serialized environments of much of the work done in IT today, the benchmarks do not represent its relative capacity. Thus it is not worth the expense to run and publish them. IBM will stipulate that when Industry standard benchmarks are an accurate representation of real work, System p is by far our superior offering. We know that many workloads are not well represented by the benchmarks. Many of those workloads depend on low latency to shared data, whereas the benchmarks represent workloads where capacity depends on high bandwidth and low contention for " thread private" data. We also know that system z is the only machine designed specifically for large instances of the former workloads. The mainframe is also designed to run a mix of workloads which share files in memory. None of the benchmarks drives this. This mixed workload optimization is present in its virtualization implementations, PR/SM and VM, as well as in z/OS. Some of those characteristics also rub off on Linux on z.

One could argue that converting workloads to a more parallel form is a "small matter of programming". In fact such a programming revolution has been advocated by many for the last 35 years. While many new parallel codes have emerged, they have not displaced the SMP shared data programming model. Given the rising expense of programmers and the falling expense of hardware and software (even on mainframes), I would expect that the legacy codes will continue to drive both paradigms forward for many years to come. Fantasies abound on both sides of the issue. The fact is that technology and legacy will continue to grind away, creating a reality that doesn't live up to any of them. What is important is matching machines to the work they are going to do, not some idealized notion of which is "better", "faster", "more advanced", "mature" or "modern".

High core count designs like Azul, and Sun's Niagara machine, are extreme examples of parallel designs. Azul takes the idea a step further in making engines which are designed to run JAVA byte code in a native way. This is an example of specialized optimization for an appliance that performs very well for a particular type of load. IBM's cell processor is another example. These types of appliance engines will proliferate and get better at their tasks. Over time, the mainframe and other general purpose machines will find ways to integrate appliances into their configurations on better connections. A current example is the HOPLON game configuration, which teams a mainframe data server and administration with cell processors to do the physics.

Using system z for consolidation is a not a blanket solution. To use your example, servers running Doom would not be included in the consolidation unless they were running at very low utilization; and probably not then without the assist of something like cell--ala Hoplon. On the other hand, there are successful consolidations of Oracle databases on z Linux such as was done at the Government of Quebec. While the individual servers involved were not terribly busy, one could not say that they were idle, either. Such consolidation would not have taken place if the economics were simply based on hardware price and how fast the engines run. There is enough of this kind of opportunity for system z and its customers to keep Linux on z growing. There are many factors involved in server selection processes. The price/performance of the hardware is almost never the sole driver, and often it is not the principle driver, either.

Others have made points about Azul finances. Regardless of what happens, such machines will continue emerge. My guess is that none will "take over." The power of legacy will remain strong, unless someone comes up with an inexpensive replacement for programmers. Also, their reach will be limited by the specialized optimization that is inherent in their designs. Each of the "current" machine types has had years creating legacy. The mainframe is the granddaddy of them all, but it is less than 10 years older than UNIX, and only 20 years older than Windows. All three are middle aged, in that they are younger than you and I (well me, anyway). I used to say that Linux was the only one that has not yet reached drinking age, but it, too, is now approaching "young adulthood," creating a legacy of its own. That legacy is quite pervasive with applications running on embedded processors, clients, blades, Intel and UNIX servers, supercomputers and mainframes. It falls on both sides of the parallel/distributed versus virtual/consolidated divide. Some applications can reside on either side as well. Others are too heavily optimized for one side or the other.

I suggest that you read two books: The first is In Search of Clusters by Greg Pfister (Prentice hall). The second is Guerilla Capacity Planning by Neil J Gunther (Springer). Both authors have an appreciation for how workload affects machines, the applicability of benchmarks, and how relative capacity is not simply governed by "Moore's Law" which, by the way, applies only to microprocessors density and not to systems capacity.

Joe Temple
IBM Distinguished Engineer

Joe Temple | Mar 04, 2008 | 6:49PM


Joe Temple is right. The mainframe or primary multi-function server is not dead and will never die. The history of computing shows us that we had only one good technique with centralized computing in the early days and then more options became available to the designer as new technologies emerged. Today we have many design options for the corporate IT designer/builder/re-modeler. IMHO, I believe that it's pretty safe to assume that tomorrow we will have more to choose from. On the other hand, none of the options have died, they have just transformed themselves and been optimized.

As each new design alternative has arisen, the share of the other changes to accommodate the total market. Sometimes the market size changes dramatically, as it did with the advent of the mini-computer and the microprocessor server.

History, however, does bring out several salient trends we must not ignore. In the beginning, there was the mainframe and it had the entire market share. As each new technological option arises and takes its place in the minds of corporate IT designers the mainframe has steadily decreased its share and has never truly grown back at any stage to its pre-eminent past. Yes, maybe MIPs, prices, shipment counts change for a while, but as each technology option emerges the original technology option, the centralized mainframe, inexorably loses just a little more in the eyes of the market. IBM has very aptly re-heated the mainframe to make it appear like it espouses many of the past emerging options, but let's remember that just like Joe talks about clusters and different types of computing having different characteristics we must always remember that despite all the open systems BS and the rhetoric of IBM marketing a proprietary OS mainframe is a mainframe nevertheless. And these words are coming from a veteran who survived OS/MVT, DOS, FS, Pacific, STS, Fort Knox and a thousand other IBM nightmares in my decades at the Blue Pig.

So the mainframe will never die, but it's future will never be as bright as it was at the beginning. It's like a collapsing supernova star, getting denser, stronger, gravitationally entrapping and better as it collapses and successfully services a smaller yet more loyal market following of technological religious zealots. It can be re-packaged, re-branded, with more "in" features and "cool" technologies, but it's still the same old concept and it still carries it proverbial technological baggage from the past.

As I have stated before, most humans in technology have a personal defining "moment" or a time when they were tested and triumphed. Unfortunately, it is normal human behavior to once having been intellectually tested the normal aging technologist's mind kinda calcifies and begins to resist change and closes itself to new radical design. This is why many embrace the mainframe, the mini-computer, Microsoft and yes, even Java and Linux. Once it works why take a risk again and change? (BTW, this is a great story for another time...where are the true innovators and risk takers in corporate IT?)

IBM is a brand that has successfully acquired strength and staying power out of this normal human behavior. It has built a culture around the mainframe and other areas of the IT business, much like Apple. Nothing wrong with that, but just as the alternatives like Azul, Linux, etc. shouldn't kill the mainframe and make IBM irrelevant, neither can IBM claim that its financial status, its history and technological options kill or reduce the potential of others that are just starting out in the market. That's just FUD.

The true top designers of the future will look at BUSINESS requirements, accept the limitations of past investments and technological decisions and move in the right application direction. As Joe Temple aptly stated, some applications are better in centralized environments versus highly parallelized and clustered ones. What was forgotten in that statement is that the true IT designer should be willing to change directions and mold the future applications to another technological model if the business case makes it possible and demands it.

The future can be a boring set of SAAS or ERP applications where IT is not a business differentiator, but a utility or service. I think not. I know there are smart people out there that realize that if every one is the same then there's no differentiator for a business to succeed.

Vive la difference. We are all zealots in a variety of churches of technology. Just don't commit the sin to call others sinners without accepting your sins as well!

The Armonk Anti-Christ

Armonk Anti-Christ | Mar 05, 2008 | 12:14AM

Mikey said:

I don't think using the name tsunami was bad luck. It was bad marketing.

You know, I'd have to agree 100% ... all I can say is it made a lot more sense after a couple of bottles of wine and some good bbq back when we were getting started ... at least we had actual marketing folks around to make up the new name (Appistry) when necessity dictated!

On to the main discussion ... Armonk makes a very good point when he argues that

IBM is a brand that has successfully acquired strength and staying power out of this normal human behavior. It has built a culture around the mainframe and other areas of the IT business, much like Apple.

But more than that - as he and several others have indicated in these comments, there will be an enduring place for hyper-optimized, low-volume, very capable boxes like the z10.
It's just that the place in question won't be nearly as broad as it has been historically.
In any case, I've expanded on this point in this post.

- Bob Lozano

Co-founder, Appistry

Bob Lozano | Mar 05, 2008 | 11:54AM