For my job I've been investigating clustering software. We have some jobs that use up to 4Gb of RAM and can take weeks to run (though most are much smaller), we want to build a farm which can efficiently utilize hardware resources of a set server pool and developer workstations.
My current ideal list of requirements is below:
Preferably OpenSource.
- Runs on Linux and Solaris.
- Should not require changes to the application (eg. relinking against it's libraries)
- A single point of management which controls all elements of the cluster (jobs, nodes etc) and some form of GUI management tools.
- A console interface so tasks can be scripted.
- Transparent checkpointing and migration
- The ability to run jobs on employee workstations while they aren't using them (overnight, during lunch etc) and to notice the return of the employee and automatically remove the job elsewhere.
- The ability to define groups of servers (each with different parameters such as availability, RAM, CPU etc) and to submit jobs to a specific group. Servers should be able to belong to multiple groups.
Here is the list of software I've investigated to various degrees.
Contents
Open Source Applications
Condor - http://www.cs.wisc.edu/condor/
- Platform: Linux 2.2, Sparc/Intel Solaris 2.7/8 and IRIX. Limited support Linux Alpha and NT.
- + Reports of great reliability.
- - Requires relinking against it's libraries (just linking not recompiling) in order to use transparent checkpointing and migration features.
Mosix - http://www.mosix.org/
- Platform: Linux 2.2/2.4
- + (?) Supports transparent checkpointing and migration of jobs
- - Requires modifications to the kernel.
- Primitive management tools (but getting better).
- - Reports of unreliability (not ready for commercial usage).
PBS (Portable Batch System) - http://www.openpbs.org/
- Platform: UNIX (which?)
SunGridware - http://www.sun.com/software/gridware/ (http://gridengine.sunsource.net/)
- Platform: Solaris and Linux (other UNIX 'coming soon')
PHP Job Monitoring system - http://vuksan.com/linux/gemonitor.html
Torque - http://www.clusterresources.com/products/torque/
TORQUE (Tera-scale Open-source Resource and QUEue manager) is a resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC, the U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading edge HPC organizations. This version may be freely modified and redistributed subject to the constraints of the included license.
Commercial Applications
- Alfred
(SgiIrix, RedhatLinx) Written by Pixel, a specalized for film renders. Has some problems scaling to a very large number of hosts.
- Platform LSF
(SunSolaris, MicrosoftWindows, SgiIrix, RedhatLinux, DebianLinux, more ...) The uber scheduling program that seems to exist everywhere. Very queue centric and strangly inflexible in some ways.
- Rush
(MicrosoftWindows, SgiIrix, RedhatLinux, AppleOsx) Don't know much about this.
Helper Utilities
- ftsh
- The fault tolerant shell. Looks very cool for writing scripts to run on lots of hosts in a cluster.
- x-CAT
- Extreme Cluster Administration Toolkit by IBM
Other Related Projects
IBM's LUI (Linux Utility for cluster Installation) - http://oss.software.ibm.com/developerworks/projects/lui
- The Linux Utility for cluster Installation (LUI) is an open source utility for installing Linux workstations remotely, over an ethernet network. What distinguishes LUI is that it is "resource based". LUI provides tools to manage installation resources on the server, that can be allocated and applied to installing clients, allowing users to select just which resources are right for each client. Examples of resources supported in LUI 1.1 are the the linux kernel and associated system map, the disk partition table, RPMs, user exits, and local and remote (NFS) file systems. LUI supports both the BOOTP protocol for diskette based client installation, as well as true network installation, using DHCP and PXE.
- System Imager
DebianLinux Cluster Components - A collection of tools useful for managing a cluster.
Other Research
Linux RAM Limitations (see LinuxRamLimits)
Some recommended software for Unix process checkpointing and migration from a SlashDot article.
Net Booting
We built a system where we boot off floppy disks using RedHat's KickStart system. It was actually pretty easy, at some point i should document it here.
- In the mean time here are some PXE links that I don't want to lose.
A big ass email from John.
http://www.jncasr.ac.in/~kamadhenu/beowulf-real.html http://www.bpbatch.org/docs/linux.html http://www.bootix.com/us/price/adminpro.shtml http://wwss1pro.compaq.com/support/reference_library/viewdocument.asp?countrycode=1000&prodid=2032|Linux+-+Red+Hat+Linux+7.x&source=163E-0102A-WWEN.xml&dt=21 http://www.kano.org.uk/projects/pxe/ - linux pxe http://support.3com.com/infodeli/tools/nic/mba.htm - 3com NIC MBA utility http://www.linuxdevices.com/files/misc/pxe_boot_stb-howto.html http://www.k12ltsp.org/clients.html - see reference to NIC computers http://www.linuxapps.com/?page=search (linux utility for cluster install) http://www.dell.com/us/en/bsd/products/model_pedge_2_pedge_2500.htm (search for pxe) RedHat PXE install: http://www.redhat.com/mailing-lists/kickstart-list/msg01886.html http://www.redhat.com/mailing-lists/kickstart-list/msg01889.html http://www.slac.stanford.edu/~alfw/PXE-Kickstart/ <----!!!! good !!!! This is important if you install a lot of machines at the same time. You can watch the syslog file on your TFTP server and whenever a client got its initial RAM disk transmitted, you can remove the symlink for that machine from the pxelinux.cfg directory. This forces the client to load the default configuration which says: "Boot from local disk!" when it reboots after Kickstart is done. Technically, it is not necessary to use a bootloader. PXE can load a Linux kernel directly if told so by DHCP. I did not try this because of the reasons mentioned in the previous paragraph I find it less convenient: You would have to change the dhcpd.conf file to tell a machine not to do a network boot any more but a boot from the local hard disk. Editing this file on the fly is more hassle than changing one symlink. http://www.SLAC.Stanford.EDU/~alfw/Publications/SLAC-PUB-9193.pdf <-- !good! http://www.redhat.com/mailing-lists/kickstart-list/thread5.html http://www.redhat.com/mailing-lists/kickstart-list/msg01926.html http://www.redhat.com/mailing-lists/kickstart-list/msg01807.html http://www.redhat.com/mailing-lists/kickstart-list/msg01681.html http://www.redhat.com/mailing-lists/kickstart-list/msg01550.html