Thursday, July 28, 2011

Distributed Cluster File System – The Lustre Option

Lustre is a parallel distributed file system with many installed bases in the HPOC world. Lustre file systems are in many of the large HPC centers – for example Oak Ridge National Laboratory (more than 10 petabyte of data I was told) and the Fujitsu K computer and many more from the Top500 supercomputers list.

The Lustre file system architecture was developed as a research project in 1999 by Peter Braam who later on found Cluster File Systems (CFS). CFS was acquired by Sun Microsystems, which was later acquired by Oracle which then decided to stop supporting Lustre. Following Oracle decision, the Lustre supporters created several support and development organizations (OpenSFS, OFS etc.).

There are mutliple vendors that provide Lustre-based solutions such as DataDirect Networks, Xyratex and Whamcloud. While the first two provide complete systems, Whamcloud provides support and maintenance for the Lustre code.

Lustre main web place is www.lustre.org. The latest release is Lustre 1.8.5. Lustre 1.8.5 provides support for OEL 5, RHEL 5 and SLES 10 and 11, offers several minor improvements, and provides a number of bug fixes.

Building Lustre is not that complex, and you can find many guides out there. Simple google (or bing) search will show you several hits - http://wiki.lustre.org/index.php/Building_and_Installing_Lustre_from_Source_Code, http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-9.html,  http://www.hpcadvisorycouncil.com/pdf/Lustre_Best_Practice.pdf and there are more.

The good part with Lustre is the community support. Even if you have small HPC clusters, Lustre can help with both performance and ease of use. Give it a try. I did.

2 comments:

  1. For a good overview of Luster checkout our interview at: http://bit.ly/nAZgrw

    ReplyDelete