Forums: Community: Campground:
"Bombproof"
RSS FeedRSS Feeds for Campground

Premier Sponsor:

 


daphna


Sep 5, 2004, 4:22 AM
Post #1 of 1 (804 views)
Shortcut

Registered: Aug 24, 2004
Posts: 92

"Bombproof"
Report this Post
Can't Post

My IT-dude husband (not the author of the article below) showed me this [connection relevant to rockclimbing apparent at the end of the first paragraph and then in the rest of the intro but after that not, so I didn't paste the rest here but included the link at the end]

Fault-Tolerant Networking
By Brian Walsh
Introduction
You will succeed in building fault-tolerant networked systems only if you have a certain attitude. The incident that gave me that attitude happened when I was eight years younger, about fifteen pounds lighter, with about 150 feet of air between my feet and the ground.

I had worked for a few years as an engineer with Tandem and Stratus fault tolerant transaction processing systems. They were successful projects. We coded up a storm of checkpoints and rollbacks, put in procedures for storing database logs on mirrored media and installed dial backup links to all our sites. I thought that I had successfully mastered the concepts of redundancy, recoverable transactions and fail-over systems. But until I started rock-climbing_and slipped_I didn't really realize how important the concepts were.

Attitude makes the difference
Rock climbing and building fault tolerant systems have a lot in common. All the gear in the world does not make a safe mountaineer. All the hardware and software you can buy won't make your systems and data free from downtime_unless you approach the task with the right perspective.

When you climb rocks, your life is constantly at stake until you are back on the ground. The minute you lose sight of that fact, all the ropes, chocks, harnesses and training are worthless. It's similar with networked systems. If you can't respect the data flowing through the net and your servers as if every transaction represents your money or your life , your designs, implementations and procedures will ultimately be ineffective.

Keep that in mind the next time someone forgets to tell users that their services are going down.

"Bombproof"
"Bombproof" is a wonderful word. It doesn't mean "things will be all right as long as the disk doesn't fail," or "if a certain hub dies we will only lose the fourth floor." It means that you can drop a bomb on the network, and users will keep working. It means that you can depend on it.

A leader, in climbing terms, is the one who assumes the lion's share of the risk, by climbing first and setting protection for others to follow. I sometimes wonder what would happen if other fields applied the same definition of a leader. The first thing that a leader does when climbing is to establish a bombproof anchor, a set of redundant, simultaneous, load bearing devices attached to the cliff. The leader starts the climb from there.

At this point, the leader has taken a recoverable risk. Falling means dropping twice the height of the climb. Then the rope goes taut, and force factors several times the weight of the climber pull on the anchor.If the anchor fails, the leader gets hurt or dies. He nce the importance of the anchor.

As the leader climbs higher, he or she places intermediate anchors. Falls will then stop at the highest intermediate anchor. If that anchor fails, the fall continues until the next highest anchor. The fewer anchors placed, the nastier the fall.

When the lead climber exhausts the length of the rope, the leader sets another bombproof anchor, relieves the second climber of the responsibility for holding the rope and assumes responsibility for the second climber, who follows the leader's route. This is repeated for the rest of the climb.

This simple system has been used to scale walls from the old rock quarry outside your home town to the cliffs of Yosemite. For veteran explorers on Baffin Island and flabby weekend warriors out for the afternoon, the stakes remain the same.

Your role
Your role is analogous to the climbing leader's. You must guide management and users through the design, selection, purchase and implementation of fault-tolerant networks. Even though you may fade into the background once the system is established, you are responsible for their safety en route . Their data and therefore their livelihoods depend on your design and decisions.

Hardware and software vendors can't remove you, the engineer and decision maker, from that role. A hardware vendor can make a hub or server fault resilient or even fault tolerant. But if you don't specify and design the correct cable plant or implement a realistic backup schedule, disaster looms.

It doesn't matter where the outage comes from. An outage is an outage. Too many techncial support staff and managers wash their hands of problems, saying, "The network is up, it must be the server," or "My server is up, it must be the network." Different disciplines need to work together to achieve fault tolerance_workstation and server vendors, in-house developers, cable designers and installers, rout er and hub designers, WAN architects, database vendors_everybody.


If you do actually want to read the rest of it that includes non-climbing stuff too, check it out here: http://www.networkcomputing.com/netdesign/faultintrob.html


Forums : Community : Campground

 


Search for (options)

Log In:

Username:
Password: Remember me:

Go Register
Go Lost Password?



Follow us on Twiter Become a Fan on Facebook