High-performance computing: supercharging the enterprise
Leon Erlanger, Information Age
14/02/2006 11:44:45
Merlin Securities, a new prime brokerage providing trading, financing, portfolio analysis, and reporting for multibillion-dollar hedge funds, needed a competitive edge. Its larger rivals, such as Bank of America, Morgan Stanley and UBS, had the advantage of expensive mainframes that could consolidate and analyse millions of trades each day and return reports via batch processing the next morning that measured performance on a monthly basis.
So Merlin outclassed its competitors by returning trade performance information in near real time with performance measured on a daily basis and performance attribution on multiple levels, including in comparison to other securities in a market sector, numerous benchmarks, and other traders in the firm. What's more, it did it using an inexpensive compute cluster made up of four dual-processor Dell PowerEdge 2850 servers.
Merlin's story is a perfect example of where the HPC (high-performance computing) market stands today. Multimillion-dollar systems from Cray, Fujitsu, IBM and NEC are rapidly giving way to clusters or grids of inexpensive x86 servers from mainstream vendors such as Dell, Hewlett-Packard and IBM. Meanwhile, the specialised operating systems of yesterday have largely been replaced by Linux.
"The HPC market has been turned on its ear," says Earl Joseph, research vice president for high-performance systems at IDC. "Cray, NEC and Fujitsu now make up less than one per cent, while HP and IBM are at about 31 per cent each, with Sun at 15 per cent and Dell at 8.5 per cent."
Up-and-coming Linux hardware vendors such as Rackable Systems and Verari Systems have begun selling scads of standards-based server clusters into the traditional HPC/technical computing markets, such as higher education, the life sciences, oil and gas, and industrial design. More importantly, however, inexpensive HPC is finding its way into much smaller environments than before, as well as into financial, search, and logistics applications previously outside its province.
"The HPC market has shown more growth than any other IT sector," says IDC's Joseph, "up 49 per cent in the past two years." Does this mean that high-performance computing is coming to hundreds of enterprise datacentres near you? It depends on whom you ask.
"The future server architecture for the entire enterprise is a cluster," says Songian Zhou, CEO and co-founder of grid/clustering software vendor Platform Computing. "The server is just a component; the server operating system a device driver. The real OS will be a layer of resource scheduling and allocation software."
But Frank Gillett, principal analyst at Forrester Research, is not quite as bullish. "High-performance clustering may be getting cheaper, but taking advantage of it is not really getting easier," he says. "Clustering will become accessible at the rate at which software is written for the architecture, and it will be quite a while [before] that's all sorted out."
For example, Merlin Securities built a high-performance cluster out of standard, relatively inexpensive hardware, but the application had to be developed in-house from the ground up.
Clusters and grids
The basic premise of HPC is simple. Instead of running compute-intensive applications on one large, specialised system, high-performance clusters and grids divide up the processing load among anywhere from two to thousands of separate servers, workstations, or even PCs. The actual architecture used, however, will vary depending on the nature of the application and where the hardware resides.
Forrester Research divides clustering and grid-computing architectures into three categories: uncoupled, loosely coupled, and tightly coupled. The uncoupled architecture, best exemplified by Web server load balancing, is more relevant to handling streams of small requests than for HPC applications.
In the loosely coupled architecture, a workload scheduler, usually running on a head server, splits up large application requests into many smaller, parallel, independent tasks and distributes them, along with small amounts of data, among the servers making up the cluster. The job management software may or may not have to aggregate the results.
For this scenario to work, the partitioned tasks must have high compute-to-data ratios and no interdependencies or order-of-execution requirements. One good example is a query search against a huge database in which the query is run concurrently against many separate database fragments. According to Forrester, this method is appropriate for such tasks as mathematically intensive risk calculations, engineering design automation and simulation, life sciences, pharmaceutical tasks such as protein folding, and animated film rendering.
Hewitt Associates, a global human resources outsourcing and consulting firm, uses a loosely coupled cluster to process what-if scenarios for its defined benefit (pension) plans on its customer Web site. These calculations can be numerically intensive, depending on the customer's assumptions and the number of plan renegotiations, mergers, and acquisitions that occurred during an employee's term of employment. With help from IBM and grid software vendor DataSynapse, Hewitt was able to split off the most intense calculations to its cluster of Intel-based blade servers, now approaching 40 in number.
Although many installations of this type consist of a single dedicated departmental or datacentre server cluster, another way to implement low-cost HPC is to distribute work across a number of shared, geographically dispersed resources in what is known as a grid. A grid can run across a few company departments or datacentres, or it can cross company boundaries to partner sites and service providers. For example, Nationwide Financial takes advantage of a concept called cycle harvesting, in which desktop PCs and workgroup servers are activated for grid computing when they are idle.
Today, however, dedicated clusters are by far the most common scenario. "When I go out and talk to people, I see lots of dedicated clusters running a single application, only a handful of shared grids spanning multiple geographies, and no examples of grids spanning multiple firms," Forrester's Gillett says.
Cycle harvesting, in particular, is largely limited to universities, government research agencies, and altruistic grids such as Parabon Computation's Compute Against Cancer project. "Most Fortune 1000 companies need to get the job done with a guaranteed quality of service, and you're not going to get that with cycle harvesting," says Verari CEO David Driggers.
And then there are some scenarios for which the entire loosely coupled clustering paradigm is unsuited. Applications such as weather forecasting, seismic analysis, and fluid analysis have to run interdependent calculations that require message-passing among cluster nodes during job execution, according to Forrester Research, which means they need a more tightly coupled architecture.
A small matter of software
In contrast to the simple, Gigabit Ethernet-connected designs of loosely coupled clusters, tightly coupled systems typically use some incarnation of the MPI (Message Passing Interface) standard to communicate between processes, and the clustered servers are linked with a high-speed interconnect such as InfiniBand, Myricom Myrinet, or Quadrics QsNet. Applications usually have to be heavily modified or written from the ground up for tightly coupled HPC, although some vendors, such as Virtual Iron, provide virtualisation software that claims to allow server applications to run unmodified across the cluster.
"Writing for MPI is not easy," says Ed Turkel, product marketing manager for the HPC division at HP. "Most of our industrial accounts go to an ISV with a commercial application."
Vendors of these applications, including Abaqus, Accelrys, Fluent, Landmark, MSC.Software and Schlumberger, live in a different world from typical enterprise application providers. "Many of these applications tend to look more homegrown than off-the-shelf," Verari's Driggers says.
Driggers adds that programmers are finding new, better ways to unleash applications from being tightly coupled, but even loosely coupled applications are usually either written that way from the start or else have to be modified to support clustering. Most ISVs don't see sufficient demand to go through the trouble of modifying their applications for HPC, so customers are left to do it on their own -- something they are seldom equipped to do.
A full-service approach
"Most enterprise users are afraid to modify an application without ISV approval," Gillett says. "Some will approach the ISV, who will tell them to work with a vendor such as United Devices, Platform Computing, or Data Synapse," which provide grid or clustering software and services for modifying applications to run on them. In addition, software licensing schemes are also typically not geared to clustering and grid scenarios, where per-server or per-CPU pricing is prohibitive.
For this reason, hardware vendors are moving toward providing customised, turnkey HPC solutions, either mixing open source HPC components such as the MPICH, LAM, and the Globus Toolkit with their own components or partnering with grid and clustering software vendors or HPC application ISVs.
"We go through a lot of presales rigour with the customer," says Victor Mashayekhi, senior engineering manager for high-performance clustering at Dell. "We'll run their codes and make the performance results available, we'll install the images, and we'll merge all the hardware pieces into racks, cable them up, and ship the racks to them."
Hardware vendors also often integrate parallel cluster file systems, from vendors such as HP, Ibrix, Lustre, and PolyServe, which are necessary to make HPC work.
Because of the relative difficulty in getting started, however, while the number of applications for HPC in the enterprise is growing, you'll still find it primarily in the more technical departments. You're also much more likely to find clusters of servers in the tens, or less than 10, rather than in the hundreds or thousands.
"You'll find HPC in financial services departments running actuarial workloads and trading analysis, or in engineering design and manufacturing," says Gillett.
"We see HPC doing things like airline route scheduling to fill seats, and in the trucking industry to maximise the use of their fleets," adds Dave Turek, Vice President for Deep Computing at IBM, noting that industrial design, digital content creation, and gaming are also strong markets.
Trickling Into the mainstream
Two recent developments hold some promise for pushing HPC more into the mainstream, however. The first is the entry of Microsoft into the HPC market in the first half of 2006 with Windows Compute Cluster Server 2003.
Microsoft is aiming squarely at the applications that now rely on Linux HPC solutions, by partnering with classic HPC application vendors such as Accelrys, MathWorks, Schlumberger and Wolfram Research, who plan to build Windows versions of their HPC applications.
"It wouldn't be too difficult for a biologist to set up a small Windows Compute Cluster of servers in his office rather than having to go to the organisation's 'high priest of clustering'," says Jeff Price, senior director for the Windows server group at Microsoft.
Northrop Grumman has already been testing Windows Compute Cluster Server 2003 on an 18-node cluster of dual Opteron servers to analyse huge volumes of satellite data simulating the detection of ballistic missile launches. "It integrates easily with our current Windows infrastructure," says Andrew Kaperonis, a systems/simulation engineer at Grumman.
The second exciting development is the movement toward SOA (service-oriented architecture). Because SOA is inherently componentised, SOA application workloads are easier to distribute across a clustered environment.
"SOA is all about abstracting away the fundamental plumbing, messaging, multithreading, execution environment in a container done once so the application developer can just focus on writing the application logic," says Platform Computing's Songian Zhou. "SOA will make grid computing easier and grids will be a must for successful SOA."
Today, however, there remain significant challenges to building and managing a viable high-performance computing implementation and, particularly, finding or modifying the software to run on it effectively. HPC is still best suited to highly technical, processing-intensive applications with specific characteristics (see "A first look at Windows Compute Cluster Server"), and with extensive help from software and hardware vendors that can deliver a complete solution.
As a growing number of enterprises begin to see the advantages of cluster and grid computing, however, they will undoubtedly work their way into other mainstream areas.
"We're seeing more and more instances of clusters and grids acquired for something like bio-informatics or financial calculations but then partitioned off for payroll and logistics," says IBM's Dave Turek. The combination of more widespread use, easier Windows-based clustering, and SOA may indeed one day make high-performance clustering and grid computing a fairly mainstream enterprise application.
[sidebar]
The 14 habits of perfect high-performance clustering applications
1 They are compute-intensive.
2 Their processes are batch-oriented, rather than event-driven.
3 They are mostly stateless - although some statefulness is OK.
4 They are built with components or a service orientation.
5 They have standards-based interfaces.
6 Their business logic is independent of the database and presentation layers.
7 They are licensed on a per-user basis, not per server or CPU.
8 Their work is easily divisible into parallel tasks that can be processed individually.
9 Their parallel tasks are not interdependent.
10 Their parallel tasks have no order-of-execution requirements.
11 Their parallel tasks have a high compute-to-data ratio.
12 Their parallel tasks have modest memory requirements.
13 The results of their parallel tasks can be easily assembled for presentation to the user.
14 They run on Linux or Windows 2003 Server.
Inside windows compute cluster server
It used to be that building a usable compute cluster took plenty of money, skills, and space in the datacentre. Although creating the actual applications that run on the cluster can still be difficult, nowadays building a Linux-based cluster is generally quite simple. Commercial and open source clustering packages abound with features, open protocols, and streamlined installs. No surprise, then, that Microsoft wants a piece of this potentially lucrative market.
I recently got a chance to test drive Windows CCS (Compute Cluster Server), currently in beta and scheduled for general release sometime in 2006. CCS is made up of several tools layered onto a standard Windows Server 2003 build. In fact, deploying a cluster node is identical to building a standard server and then applying the clustering package, which will be available for purchase separately.
As is usual for Microsoft, the new clustering tools leverage a number of existing Microsoft technologies. ICS (Internet Connection Sharing) on the cluster head provides NAT for the cluster nodes, which exist on a private network. RIS (Remote Installation Service) provides unattended cluster node installations from the head node. The cluster management console is a plug-in to Microsoft Management Console. All authentication is provided by Active Directory, which allows for quick integration into an AD network.
The backbone of CCS, however, is Argonne National Laboratory's MPICH2 message passing interface.
Microsoft has done a significant amount of work to bring it to Windows and, more interestingly, has contributed that code back to the project. Kudos.
Setup and configuration of a CCS cluster uses a task-list approach, walking users through the necessary steps. As of now, this process is a bit too automated for my taste, leaving those with clustering experience wondering exactly what's going on in the background - and nowhere to look when troubleshooting. The current build of CCS is also quite raw in some places, such as the job scheduler, but then, it is still in beta.
I haven't had too much time to work with CCS in the lab, but so far I've managed to build a basic distributed DSA (Digital Signature Algorithm) key generation app running across the cluster, depositing generated keys to a common network location. Much further testing and more complex code will be necessary to truly put CCS through its paces, however.
To test the software, Microsoft provided a RocketCalc Saturn four-node personal cluster equipped with eight AMD Athlon64 2GHz CPUs and 8GB of RAM. The Saturn is a cool piece of hardware regardless of the OS, and it highlights the market Microsoft seems to be targeting: the minicluster. Instead of running one large cluster in the datacentre, it's feasible to deploy something like the Saturn to an individual engineer's cube. Could this style of clustering become the eventual core market for CCS? Time will tell.
- Paul Venezia
[ Printer Friendly Version ]
[ Other stories about Wolfram Research, United Devices, Intel, VIA, Parabon Computation, Northrop Grumman, Myricom, Hewitt Associates, Hewlett-Packard, Polyserve, AMD, Microsoft, IDC, NEC, HIS Limited, Forrester Research, DataSynapse, Dell, Morgan, Morgan Stanley, Fujitsu, Saturn, Cray, HP, CCS, Speed, Promise, Platform Computing, IBM, Paradigm ]
|