What is a computer cluster?

ANSWERS: 1

Borderlinux

I'll humor a serious answer... Clusters serve multiple purposes. Typically, in "Mission Critical" (MC) enterprise environments, clusters are formed for fault tolerance. A cluster is more than one computer working as a team. A simple cluster could be 2 "nodes". Each node is a separate physical computer (although technically you could create a cluster of virtual machines inside one physical machine or even one physical partition...or even in one virtual partition...I digress). The idea is a MC application is running on one node, configured in such a way that if the server crashes, the application "fails over" to another node, thereby limiting the downtime of the application. You can have a cluster of computers sitting in the same room, but if the room exploded then that's not too fault tolerant. A "Campus cluster" offers more resiliency up to the distance limit of your cabling for a direct connection between nodes...so you could have one node in one building and another node in another building...that way a building could blow up and your application could still fail over to the other building. Next step up is a "Metro Cluster" where you use Storage arrays like an EVA or XP12000/24000 at 2 locations in the same city...with data replication between sites accomplished by the storage array hardware. In this configuration you could lose a whole campus...and still fail over to another data center on the other end of town. There is also a "Continental Cluster" which uses different but similar technology to fail over accross vast distances...your town could get nuked and you could still fail over your application(s). Load balancing is another use for clusters. Example: a 4 node Oracle RAC cluster...all 4 nodes mount and access the exact same filesystem(s) concurrently. You could have the Database data stored on a clustered file system, and run 4 separate databases on 4 separate nodes all accessing the exact same filesystem in real time. That can be cost effective by using 4 32-core servers costing $160k each (like 32 core Itanium2 rx8640's) instead of one $1,000,000 128 core "Superdome" spec'd out in a similar fashion, while still gaining availability (If one node fails, the other 3 are still running...they just need to pick up the load of the crashed node). Clusters that are designed for High Availability run services that do absolutely anything they can to preserve data integrity....including crashing the whole cluster. There are many safeguards built in to clustering daemons to accomplish exactly that...like disk locks or utilizing quorum servers in the even that exactly half of the nodes get in to a race to determine who is in control of the lock disk. Basically, if one node of a 3 node cluster loses connectivity to the other 2 nodes, it knows it *used* to be in a 3 node cluster, and it is only one node, so it will not act as the "master". If there is a 2 node cluster, however, and connectivity is lost between them, then each one knows it is exactly half of the cluster. Since it has no way to know if the other node is running (it lost connectivity, remember?) it will decide to crash rather than to risk controlling a filesystem while being unaware another server may be trying to write to the exact same filesystem (which would cause filesystem corruption and data loss). That's where a disk lock or quorum server come in....they serve as "referee" and give each node someplace else to race to and "mark their territory". One server will win the race and make it's mark, then the other node will see the mark and know it lost the race...this will allow the winner to fire up and run the application(s). Without the "mark", be it a lockdisk or quorum server communication, then there is no way for either node to be aware the other one is alive...so in the interest of protecting the data (ensuring multiple servers cannot possibly write to the same filesystems at the same time), they will *both* crash and the application will not have any place to fail over to. I can go in to a LOT more detail and try to clear up confusion like multiple nodes writing to a filesystem being bad in my last example, yet in Oracle RAC clusters the entire basis is to allow multiple computers to read/write the same physical filesystem concurrently, etc...but I probably got too technical already. I work with these things every day...superdomes, metroclusters, etc...systems that cost more than I earn in a decade. Clustering is difficult (for me) to explain without going in to the nitty gritty details since I gotta troubleshoot them down deep. I took back the -5 on this one and gave ya a point...I'm going zzz. You pissed me off earlier and I haven't really changed my mind...but what the heck, I'll toss out this info and see if it's even appreciated or if you're just trolling.

No comments
| Permalink | Share | 22

0
No comments

ABOUT ANSWERBAG

Answerbag wants to provide a service to people looking for answers and a good conversation. Ask away and we will do our best to answer or find someone who can.We try to vet our answers to get you the most acurate answers.

Answerbag | Terms of Service | Privacy Policy

RELATED QUESTIONS

RELATED TOPICS

ABOUT ANSWERBAG