Wednesday, August 17, 2011

NCSA Post Blue Waters – What’s Next?

As mentioned in one of my previous posts, IBM and NCSA announced the termination of the Blue Waters project due to higher cost that has been associated with this project. In simple words, the NSF grant was not enough to cover the expenses of a proprietary system that was suppose to reach a specific performance goal. I cannot say that this was a surprise…. So what’s next? There are two obvious options - the first one is for NCSA to find another vendor that can build a system that will hit the performance target and will stay within the budget. The second is for NSF to collect the grant back and to open a new bid for the money.

As I see it, the only system that can hit both the performance goals and the budget limitation is a standard based system. X86 (Intel or AMD) processors and the InfiniBand network might do the job. If only NCSA was clever enough to go for it the first time, I bet that the system would have been already up and running and with the change they could focus on some innovative software development.

The Pleiades Supercomputer at NASA/Ames Research Center is a great example for a cost effective, high-performance system. 184 racks (11,712 nodes), 1.315 Pflop/s peak cluster, 1.09 Pflop/s Linpack, 111,872 total cores (no GPUs), InfiniBand interconnect, partial 11D hypercube topology (SGI in this case, but you can pick any other topology as well). 12 DDN RAIDs, 6.9 Peta byte total, Lustre. Could be a good example to what NCAS needs to do now?

