High-Availability Benefits from Virtualization

Agencies combine virtualization with fault-tolerant hardware for ultimate resiliency — and to ease server management.

Google+ Twitter

Dan Tynan is a freelance writer based in San Francisco. He has won numerous journalism awards and his work has appeared in more than 70 publications, several of them not yet dead.

MAY 2010
E-NEWSLETTER

» Build a Disaster-Ready Data Center

» Pump Up Availability with Virtualization

» Review: Compellent Storage Center

» Get Ready for Unified Fabric

» Exchange HA: Keeping on Message

It’s been said that some organizations are too big to fail. The same is true of your critical applications and data. A server crash, a power failure, even user error can cause your systems to become unavailable precisely when you need them most. That’s why more federal agencies are turning to virtualization to ensure high availability of their most critical IT assets.

“The benefit to using virtualization for high availability is that it’s much simpler for IT managers,” says Dan Kusnetzky, vice president of research operations for the 451 Group. “You don’t have to change applications manually if they’re running inside encapsulated servers or clients using motion technology. Virtualization offers simplicity, in that you have multiple machines running on a single server and the workload can move back and forth as needed.”

Of course, high availability means different things to different people. For some, it’s having a virtualized system where, if a critical app or even an entire server fails, a new virtual machine automatically takes over within minutes or possibly seconds. For others, it’s using fault-tolerant servers that provide full hardware redundancy, allowing for real-time replication of processing and data storage and assuring uptime that approaches 99.999 percent.

Keep It Moving

“The apps we worry most about are our web-based electronic document workflow application, selected applications in our eTools web-based suite of contract management applications, the database underlying the eTools apps and, of course, everyone’s No. 1 mission-critical app, e-mail,” says Michael R. Williams, CIO for the Defense Contract Management Agency. DCMA is responsible for making sure Defense Department contractors meet all their contractual obligations.

Williams says the primary goal in moving DCMA to a virtualized environment was to save money by collapsing 17 data centers into two. Higher availability was just “the icing on the cake,” he says. “It turns out virtualization brings with it flexibilities that can be leveraged to increase redundancy (for example, clustering in virtual mode and virtual failover machines in standby status. That, plus the ability to rapidly fire up a virtual server to pick up the workload from a failed machine, increases availability.”

Virtualization alone, however, won’t guarantee continuous operation. The most reliable approach is to create a virtualized environment using fault-tolerant hardware to synchronize data processing across multiple virtual machines.

When it comes to the need for continuous operation, few agencies can match the Federal Aviation Administration, which has used fault-tolerant hardware from Stratus for air traffic control and critical systems for more than 30 years. Two years ago, FAA began virtualizing its operations-critical international message switching system using Stratus systems running VMware ESX.

Building in fault-tolerant hardware to ensure that systems are continuously available can add 25 percent to 35 percent to the cost, says Denny Lane, director of product management for Stratus. But the alternative can be far costlier.

Customers rarely take the time to calculate how much downtime really costs them, Lane says. The amount of time it takes to resync and restart systems, the loss of data and productivity and the consequences of failing to comply with federal regulations can be enormous.

“Even if you have a clustered hardware solution and it fails and restarts, what’s going on during that time can be lost forever,” Lane says. “A high-availability system that gets rebooted may not be good enough. If you have gaps in the auditability of data, you can run into penalties.”

Kusnetzky agrees: “The lowest level of high-availability requirements can be met by virtual machine software combined with motion technology, but the highest levels of availability cannot be achieved by virtualization because the transition time is too long,” he says. “Put in boxes designed for continuous availability, have virtualization software running on them, and you’ll never see a failure.”

Three Questions to Answer

Which apps require high availability? Enabling an app for high availability typically costs more because of the need for redundant hardware and software. For that reason, an organization must decide which systems really are critical and need to have 24x7 availability.

What's the required uptime? An organization must also decide how much downtime is acceptable. Will going offline a few minutes a month affect your operations? How about a few seconds? Apps that need to run continuously require more planning and an investment in fault-tolerant hardware.

Is there a continuity strategy? Even the best failover strategy will falter if a natural disaster wipes out an organization’s regional infrastructure. If you need five-nines uptime, be prepared to replicate critical systems and data at a second location — ideally, in a different time zone.

27% Data centers that have experienced an outage in the past year

56% Data center managers who identify availability as their No. 1 priority

1st Rank of human error as the cause of most data center outages

50 minutes Average enterprise IT downtime per week

3.6% Annual enterprise revenue lost to downtime, on average

SOURCES: Aperture Research Institute, Emerson Network Power, EMA Research, Infonetics Research