This cluster was created for Dr. Shin Han Shiu for use decoding plant
genomes and other bioinformatics related tasks.
Software:
It runs the Rocks clustering distribution that is based on CentOS (which is based on RedHat Enterprise Linux) and is pretty much a cluster on a disk. It contains ganglia for monitoring, MPI and Sun Grid Engine (SGE) for task queuing, pre-configured kickstart server for painlessly building computing nodes, pre-configured dhcp, nfs and other network services, apache, mysql and just about anything else you will need for a cluster.
Architecture:
One head and many nodes is the basic idea. The head is the machine that contains all the private and public daemons to run the cluster. It consists of two gigabit network cards. One with a public ip plugged into one of my public switches and the other plugged into a dedicated cluster switch with no uplink to the internet or public network. All of the nodes contain one network that is plugged directly into this ‘private’ switch. You can think of it as a NAT network with the head acting as the firewall/router.
Hardware:
This particular cluster was pieced together out of pieces to fit specific needs. We wanted to maximize the CPU qty per rack units used so we went with the Dual AMD Opteron machines with Dual Core CPUs. That gave us a total of 4 CPUs for what was supposed to be 2 rack units of space. Some where along the line between the university’s purchasing department and the vendor we ended up with 3U machines.
Each machine has a DVD rom so I don’t need to swap CDs during OS installs.
Each node uses a Tyan Thunder K8SD Pro motherboard because it supports our CPU choice, it has integrated video and dual gigabit ethernet, PCIX slots to support raid controllers and a can accept up to 32gigs of PC3200 ram.
The head contains 2 3Ware SATA RAID controllers. One is an 8 port that has 1.5 terabytes of storage running RAID5 for the storage array. The other is a 4 port that has 3×120gig drives running RAID1 with 1 hot spare for the OS drive.
We then have Type 1 and Type 2 nodes, the only difference being the Type 1 has 2 gigs of RAM and the Type 2 has 16. The qty break down is as follows.
Head: 1
Type1: 3
Type2: 1
setup:
The setup should have been far easier than it was. Lets just say that I have had a heck of a time making 3Ware sata raid controllers work on x86_64 RedHat based linux based OSes. It has made me decide that I will be using LSI MegaRaid cards from now on. This isn’t the first server I have had these problems on.
head setup:
I stuck the boot DVD in the head and at the boot prompt typed ‘frontend’ to indicate that it shouldn’t try and boot a kickstart install. That feature is very useful as you will be installing far more computing nodes then you will heads (one to many).
The OS install is fairly similar to any other text based RedHat install. I think it said it could do a graphical install but apparently it doesn’t support the integrated ATI XL on the Tyan mother board.
I won’t go through step by step, but I will spew out a few useful tidbits of wisdom that I discovered along the way.
Let it automatically set up your partitions. It was smart enough to turn my raid5 array into storage and break up the OS specific partitions on the smaller raid1 array. The partitions it set up for the OS where very sane.
Watch closely as you configure your networking. One is for your public interface and the other is the private. It will annoy you if you are trying to whiz through and assign them in the wrong order.
I installed every Roll (Rocks packages) available on the dvd as we plan on using both MPI and SGE.
Node setup:
On the head you will need to log in as root (either via ssh or locally) and run ‘insert-ethers’. This is a sort of wait-for-call screen like we had in the bbs days, only its used for initiating a kickstart with a node. Those familiar with redhat’s kickstart may wonder why they have to do this instead of just leaving the kickstart server running. The answer (as best as I can determine) is security. This way a machine can’t be introduced into your network without you (root) initiating it.
Next we stick the boot DVD or CD into the first node. Being that this is the first rack (referred to as cabinet in the ROCKS docs) and the first node it is compute-0-0. So the if you have multiple racks, the 3rd node in the second rack would be compute-1-2. That is if you choose to stick with their naming scheme. Honestly, I see no reason not to. Its descriptive and well thought out, besides you can assign the public interfaces host name to anything you like.
After it boots from the cd/dvd and connects to the kickstart server you will see it on ‘insert-ethers’ on the head. You can identify it by the MAC address that it displays. At this point the head sends the kickstart information to the new node and registers it with all necessary services including dhcp, its mysql database and so forth.
“Rinse and repeat” for each of the other nodes and you have a fully functional HPC (high performance computing) cluster.
















