Thursday, July 28, 2011

Distributed Cluster File System – The Lustre Option

Lustre is a parallel distributed file system with many installed bases in the HPOC world. Lustre file systems are in many of the large HPC centers – for example Oak Ridge National Laboratory (more than 10 petabyte of data I was told) and the Fujitsu K computer and many more from the Top500 supercomputers list.

The Lustre file system architecture was developed as a research project in 1999 by Peter Braam who later on found Cluster File Systems (CFS). CFS was acquired by Sun Microsystems, which was later acquired by Oracle which then decided to stop supporting Lustre. Following Oracle decision, the Lustre supporters created several support and development organizations (OpenSFS, OFS etc.).

There are mutliple vendors that provide Lustre-based solutions such as DataDirect Networks, Xyratex and Whamcloud. While the first two provide complete systems, Whamcloud provides support and maintenance for the Lustre code.

Lustre main web place is www.lustre.org. The latest release is Lustre 1.8.5. Lustre 1.8.5 provides support for OEL 5, RHEL 5 and SLES 10 and 11, offers several minor improvements, and provides a number of bug fixes.

Building Lustre is not that complex, and you can find many guides out there. Simple google (or bing) search will show you several hits - http://wiki.lustre.org/index.php/Building_and_Installing_Lustre_from_Source_Code, http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-9.html,  http://www.hpcadvisorycouncil.com/pdf/Lustre_Best_Practice.pdf and there are more.

The good part with Lustre is the community support. Even if you have small HPC clusters, Lustre can help with both performance and ease of use. Give it a try. I did.

Wednesday, July 27, 2011

Best HPC News Websites

Or where do I go to read the latest and greatest HPC news, HPC developments, HPC usage models, HPC applications and other HPC opinions and views…

1.       My favorite is InsideHPC. A great web site that covers it all – from vendor news and opinions to new deployments and user installations and usage models. You can find there very interesting articles and various videos on many subjects. Rich Brueckner is doing a great job there, and provides even nearly real time coverage from multiple interesting conferences and workshops.
2.       HPC Wire – one of the major HPC news web sites. It is not my first pick as it had become too much “commercial”. The web site is too “busy”, and it seems that the editor is busy as well with creating vendors “conflicts”, probably to draw more comments from readers, instead of providing a more comprehensive coverage on the matters that are really important. I prefer to hear from users and vendors rather than to read what the editor has to say…
3.       Cluster Monkey - another favorite of mine. Most of it is around technical articles and it is a great place for information. Kudus to Douglas Eadline for the great web site. No commercials, but rather good technical data.
4.       Supercomputing Online – basic news coverage
5.       Linux Magazine – a basic coverage, some articles are vendors sponsored. You will find there articles from Douglas Eadline for example.
6.       There are several more, and of course there are web sites that are more related to specific countries.

If you know of other good places, let me know.

Thursday, July 21, 2011

NVIDIA GPUDirect – QLogic TrueScale InfiniBand – Real or Not Real?

Noting personal, but when you come across advertisements that can be misleading you do want to express your own opinion on that case…  Since I am a big supporter of hybrid computing, I am trying to be on top of things and find ways to make my GPU clusters run faster. I was happy to see the GPUDirect development – both GPUDirect 1 for remote GPU communications and GPUDirect 2 for local GPU communications. Unfortunately GPU Direct 2 is limited right now to GPUs on the same IOH, but we can always hope for better…

Looking into the better setting for GPUDirect 1, I noticed that there are two options for the InfiniBand side (it seems that GPU Direct 1 is supported on InfiniBand only for now) – the QLogic one and the Mellanox one. Both are listed on the NVIDIA web site.  I will focus on the QLogic one for this post.

QLogic published a white paper on “Maximizing GPU Cluster Performance”, and from this paper I quote:  “One of the key challenges with deploying clusters consisting of multi-GPU nodes is to maximize application performance. Without GPUDirect, GPU-to-GPU communications would require the host CPU to make multiple memory copies to avoid a memory pinning conflict between the GPU and InfiniBand. Each additional CPU memory copy significantly reduces the performance potential of the GPUs.” I completely agree with this statement. This is the reason for GPUDirect 1. I would prefer to see a more direct connectivity between the GPU and the network, but for now this is useful as well.

Continuing quoting” “QLogic’s implementation of GPUDirect takes a streamlined approach to optimizing…” wait a second…  QLogic InfiniBand is based on using the host-CPU for the InfiniBand transport (also known as “On-Loading”) therefore any data must go through the CPU before it can hit the wire. So how come you can avoid having the CPU from being involved in every GPU communication if you need the CPU to create the InfiniBand network packet??? Well, you cannot. “streamlined” in this case means no real GPUDirect… What QLogic does not mention in this paper, is a comment they made during a presentation they gave before this paper was published. Their test bed system included 2 (yes, two… the number 2) GPUs per node. Therefore in this case, the only real GPUDirect they could test is GPUDirect 2.

In the same paper, they also compare their InfiniBand (“True Scale”) performance to their competitor – meaning Mellanox. This is a tricky situation… if you do such a comparison, do it right. I went o look for any numbers from Mellanox, and was happy to find some but not all. Some of the QLogic results claimed to be on “Other InfiniBand” could only be found in the QLogic paper, therefore I can assume that these benchmarks were done by QLogic. I did compare the results found on both the QLogic paper and Mellanox publications and guess what – QLogic published much lower performance on the Mellanox solution than what Mellanox did on their own. For example - Amber FactorIX benchmark results – QLogic claims that Mellanox can achieve 10 nanosecond/day on 8 GPUs, and Mellanox reported nearly 19 nanosecond/day on 8 GPUs – nearly twice. It would have been much better if QLogic would have focused on their solution, rather than spreading FUD… food for the thought. 

Wednesday, July 20, 2011

Job Schedulers – Overview

Simply saying, a scheduler is a software utility responsible for assigning jobs and tasks to resources according to pre-determined policies and availability of resource. A job can be comprised of one or more tasks along with relevant information on the required resources (number of nodes, GPUs, network bandwidth, application license etc.). Jobs are submitted to a queue for proper batch processing and optimization of resource utilization (for example, using nearest nodes instead of far-connected nodes). There may be one or more queues, each with policies around priorities, permissions etc. There are multiple options for job schedulers – commercial (close source) and open source.

Commercial Schedulers:
Moab - a cluster workload management package from Adaptive Computing that integrates the scheduling, managing, monitoring, and reporting of cluster workloads. Moab Workload Manager is part of the Moab Cluster Suite. Moab’s development was based on the Open Source Maui job scheduling package.

Platform LSF - manage and accelerate batch workload processing for mission-critical compute- or data-intensive application workload.

Open Source Schedulers:
SGE (Sun Grid Engine) - a distributed resource management software system. Almost identical to the commercial version - Sun N1 Grid Engine, offered by Sun Microsystems, now Oracle.

TORQUE - an open source distributed resource management system providing control over batch jobs and distributed computing resources. It is an advanced open-source product based on the original PBS project and incorporates community and professional development. TORQUE may be freely used, modified, and distributed and is designed to work with the Maui Scheduler.

Maui Scheduler - an open source job scheduler for clusters, capable of supporting an array of scheduling policies, fair share capabilities, dynamically determined priorities, and exclusive reservations. It also includes system diagnostics, extensive resource utilization tracking, statistics, and reporting engine, as well as a built-in simulator for analyzing workload, resource, and policy changes.

Platform LAVA - an open source scheduler solution based on the workload management product LSF and designed to meet a range of workload scheduling needs for clusters with up to 512 nodes.

Condor - an open source workload manager, developed at the University of Wisconsin – Madison. Condor performs the traditional batch job queuing and scheduling roles. Red Hat has based its MRG Grid product (part of Red Hat Enterprise MRG) on Condor.

Tuesday, July 19, 2011

You Can Sell Me the Future, but Be Credible…

Some folks I know prefer to skip any discussion or a meeting with any sales people, and some folks have no issue at all. I am among the ones that like them all, as long as they are credible. A typical sales guy will try his or her best to close the deal, and sometime by promising things that are not yet there or not yet ready. This is fine. As long as you get the expectation of when it will be available, how you are going to use it, and if it works for you, than go for it. As long as the discussion covers the sales person product we are all set.

The one thing I don’t appreciate is not being credible, and not being credible means comparing your product performance with your competitor product, and not giving the real picture. The right way to do it is by making sure you have the best performance numbers your competitor can demonstrate, and only than you can perform a credible comparison. And the best way to ensure that is by using performance numbers that your competitor has already published.

If you are planning to do your own testing with your competitor product, make sure the results are the best that can be achieved, so make sure to have the best setting for your competitor product and compare your testing it to what your competitor has already published (if exist). Only than you can do a credible comparison. If you can’t do it, don’t compare at all.

The worse thing a sales person can do is to lose credibility by doing fake comparisons. Once I see that, the only thing that sales person will see is my door… and a “Real or Not Real” blog post…

Friday, July 15, 2011

Cray – The Good, The Bad and in Between

Steve Scot, Cray CTO, presented a short update on the company achievements at ISC’11. Cray is one of the vendors playing at the top of the high-performance computing market. Unlike other vendors that offer solutions for the broad range of the HPC market – from high-end to the small clusters, Cray solution are mostly being used at the high end. It seems that the company attempt to offer commodity based clusters for the rest of the market segments did not really worked out…

In the past Cray had their own processors, interconnect etc, but in the last years they have decided to use the AMD CPUs and only left with their proprietary interconnect. Personally I am not a fan of proprietary solutions and in favor of the standard and open source. Open source does not mean free… in most cases I do pay for support, but standard and open source enable a broad range of solution across many usage models, and not just a limited set.

Cray has the most systems than any other vendor in the top 100 systems of the TOP500 supercomputers list Steve said. Cray has 23 systems in the top 100. Second is IBM with 22, third is HP with 11 and fourth is SGI with 9. On the other hand, the proprietary section of Cray – the interconnect, is a minority on the top 100. The biggest winner of the top100 is the standard based solution – x86 CPUs with InfiniBand (and Ethernet too, but for the top100 Ethernet is being used in a single system only) which is being used in 62 systems. One of the benefits of standard solution is the broad range of products and vendors that one can choose from. Therefore in the top100 standard has 62 systems (IBM, Dell, HP, SGI, Appro, Bull, Acer, T-Platform and others) while Cray has only 22. My vote goes with the standard…

In my opinion, Cray great advantage is with their software – compilers, tools etc, but not with their proprietary hardware.  InfiniBand for example is a better solution than Cray Gemini – latency, throughput, offloading etc. If Cray would make their software available for the standard based solutions that would be the best thing, but paying extra for hardware that is not better than the commodity option is a big waste.

Scot also showed a performance comparison between Gemini and InfiniBand. I saw a totally different performance set for InfiniBand in some other presentations made by the InfiniBand vendors, and it seems that Scot is not using the right number for their competitors... If you do decide to show numbers of competitor solutions, please make sure you have the right numbers… Integrity is the most important thing you want to maintain… especially in our small HPC community.  

Wednesday, July 13, 2011

Why I Wish I Was Back in College

You ever think back to your college days and wish you could do it all over again? I sometimes daydream about this, and no, not just about the weekend parties J. Sure I can go back and take some classes to sharpen my skills but there is one thing I wish was around when I was in the university: Student Cluster Challenge! Over the past couple of years I have watched this annual event during Supercomputing where they take a few university teams around the world and hold a competition where the teams must build a cluster, run some apps, show some performance benchmarks, and do so under a strict power budget. The kids participating are very smart and work night and day (they sleep in their designated area) to complete all their tasks before the deadline. Wow! I’m sure there is beer somewhere on the facility, too, no???
So guess what? I just saw on the ISC’12 website that they are going to do something similar: http://www.isc-events.com/isc12/Take-Part/Student-Cluster-Competition. That’s fantastic…and this is why I wish I was back in college…

Tuesday, July 12, 2011

Nothing Like a Good HPC Show

I’ve recently been asked what HPC shows I typically attend and what my thoughts are on the few popular ones around the world. Well, I must say, nothing beats a good technical HPC show to really get yourself deep in the clustering goo of what’s being used today, latest techniques, and future plans on what’s coming next. Of course there are the exhibits, too, but that isn’t as always exciting to me. I mean, most exhibitors don’t have live demos in their booth these days and are mainly busy just collecting leads with gorgeous models or marketing folk that are at their first show and still don’t know what their company does or why. I want to see your latest products working with a solution that matters to me, and I want to talk to a technical person that can answer some of my more advanced questions. Of course, I will be happy to take your pen, too. But enough of that…
So, with that, here is the list of conference I like attending because of great technical content and speakers:
  • Supercomputing (SC) – This is a no brainer. You can’t miss this event if you are even remotely in the HPC arena…yes this includes advanced data center technologies, too, because all things that start with supercomputing eventually trickles down into other data center markets.
  • International Supercomputing (ISC) – great show that happens mid-year so you stay abreast of emerging topics and trends. It is on my list for must be the last 6 years.
  • SciDAC – an important event for the US major HPC sites
  • HPC China – Without a doubt China is in high-gear on their supercomputing efforts and it’s important to continue to watch developments in the region. Problem is that most session are not in English…
  • HPC Advisory Council – Now, I haven’t attended these workshops personally (they are mostly outside of the USA) but the presentations that are posted seem very interesting. I’ve heard good things and I hope to attend one of these days.
So what do people think? Any other events you think are worthwhile? If so…why?

Monday, July 11, 2011

ISC’11 Hot Seat Sessions – InfiniBand versus InfiniBand

Thanks to Rich Brueckner, we could see a recap of ISC’11 Hot Seat sessions of Mellanox’s CTO Michael Kagan and QLogic’s VP of engineering Philip Murphy on www.InsideHPC.com.

First let me start with the final part of the Hot Seat sessions – the questions. Mr. Jack Dongarra decided to ask some weird questions to say the least. His question to Mr. Kagan was to comment on a recent system that Mellanox has lost.  Dah?? Does someone expect any vendor to win it all? Intel sometimes loses to AMD, IBM to HP, HP to SGI and so on. Instead of asking Mellanox, the leader in InfiniBand, on future InfiniBand development for example, Mr. Dongarra selected an embarrassing question, embarrassing not to Mellanox, but to himself. His question to Mr.  Murphy was about InfiniBand being a player in the Exascale, saying that he does not believe in that option. Really? Do you have other proven options? InfiniBand was in the top 10 of the TOP500 since 2003, and I am using it since 2005. It seems to me that either Mr. Dongarra deals with too much politics, or maybe he does not know what to ask. In any case, it might be better to find another examiner for next year.

Second, it was fascinating to compare between Mr. Kagan and Mr. Murphy. I had the feeling that I see the case of a leader versus a follower. Mellanox is clearly the leader in the InfiniBand market, and the first one to offer FDR InfiniBand (same as in the QDR case few years ago, where they have offered QDR a year before QLogic had anything to suggest). Michael mentioned the big FDR win at LRZ - a 3 Petaflop system (probably around 15,000 nodes in a single system). Mr. Kagan also viewed his vision for the future, and did not mentioned their competitors. On the other hand Mr. Murphy has some mistakes in my eyes. First he mumbled about the future of InfiniBand when he was asked about InfiniBand for the Exascale. Instead of being confident to be able to deliver a scalable solution, he talked about the conditions for InfiniBand to be in the Exascale – it will be there if its performance will continue to improve. Improved by whom? By Mellanox? Another item was when he talked about the LLNL recent system announcement.  At the beginning he mentioned up to 20,000 nodes and then he stammered and mentioned that the system will be built as 150 server islands. So it is not a single, large scale system? In this case, why the bragging on scaling? First show a system on the top 10 of the TOP500 to get some credibility Mr. Murphy.

And last the proprietary angle of things. One of the things I like in InfiniBand is that it is a standard interconnect solution. A standard physical layer, standard link layer, standard transport layer and a standard software API (verbs). I am using InfiniBand for long time already and enjoying its scalability and the large eco-system behind it. I did not like the proprietary software APIs, such as QsNet from Quadrics and MX from Myricom. PSM from QLogic is the same bad thing. I would recommend focusing on a standard solution, instead of creating a proprietary interface that is there to hide hardware limitations. By the way, InfiniBand is in the top 10 of the TOP500 since 2003, so please don’t tell us that InfiniBand was designed for non HPC system Mr. Murphy, and please don’t use this excuse to defend pushing proprietary over standard.

Saturday, July 9, 2011

CPUs and GPUs – Separate Entities or an Integrated Solution?

Logic is the beginning of wisdom. Not always we use logic to determine our actions, but the more we try, the better decisions we will take.

Today CPUs and GPUs are separate entities. CPU is the place where applications are being executed on, and the GPU is a compute offloaded device, where part of the applications can be executed on. Even if the entire application can be executed on a GPU, still GPU computing requires a CPU for management.

Three version of GPUs exist, or being talked about – NVIDIA (with CUDA), AMD (with OpenCL) and Intel (with C++). Each GPU has its own programming interface which makes it difficult to write a single program to fit them all. Some say that in the future we will end with a single language, but while it is an option, there is still a room for multiple interfaces – same as we have MPI and SHMEM.

With regarding to integrated solutions, AMD has their Fusion program in which CPU and GPU are supposed to be united into a single chip solution. NVIDIA announced their "Project Denver" to build custom CPU cores based on ARM architecture. Intel could definitely do the same. Does it mean that GPUs stop to exist? Logic says no. Today GPUs are much more powerful than standard CPUs, and we can assume that all the development around GPUs will continue. Therefore a gap between the GPU and CPU capabilities will continue to exist, which means that integration, while possible, will still provide the ability to customize the single chip to be “GPU flavor” or a “CPU flavor” one. The united solution can definitely share the same socket design or be provided as an adding card, but this is only the physical packaging.

In other words, I believe that Hybrid computing will continue to become an important architecture in the future, and modifying the compute offloaded devices to be more as “service providers” will make future usage earlier and more flexible.

Wednesday, July 6, 2011

Affordable Exascale Computing

As we are entering the Petascale era – today all top 10 fastest supercomputers have already demonstrated sustained Petaflop performance (as listed on the TOP500 list, http://www.top500.org/, and you can find some nice articles around it on www.InsideHPC.com), more and more funding are allocated or directed toward technology development for the next phase – the Exascale.

In the TOP500 there are two groups of systems. One group is the proprietary high-end systems (Cray, Blue Gene, K etc.) with a limited usage only at the top of the list. These of systems are not in the reach of the rest of the HPC community and their vendors might show dependency on special funding programs in order to exist. A second group that has systems from the top 10 till the last 100 entries on the TOP500 is the standard based solutions group (x86, with or without GPUs, InfiniBand or Ethernet connected). Within the standard group, we can see that Ethernet is being used mainly for the low end systems and InfiniBand is being used for the entire range of the TOP500 list.

Having a standard based solution that can be used from the top supercomputers till the small, workgroup or departmental-class HPC systems is critical to maintain affordable HPC. A broad use of technology makes it cost effective and in the reach of the entire community. I embrace all the new developments around the standard based solutions and am calling the government funding organizations to support these activities as much as possible. Not having a standard based Exascale solution will cause the Exascale systems to be too expensive to build.

Tuesday, July 5, 2011

NVIDIA GPUDirect – Here is the Complete Story

Let me start with some history here. If we follow the development of things, it appears that the first time GPUDirect was mentioned was in the NVIDIA press release announcing the GPUDirect project as collaboration with Mellanox Technologies - “NVIDIA Tesla GPUs To Communicate Faster Over Mellanox InfiniBand Networks”, http://www.nvidia.com/object/io_1258539409179.html. Since then you can find more press releases and numerous papers describing what it is, and the performance gain you can achieve using it.

Today there are two GPUDirect versions – version 1.0 and version 2.0. Version 1.0 is for accelerating GPU communications between GPUs located on separate servers over InfiniBand and it exists in CUDA 3.0 (that version requires kernel patches to make it works) and in CUDA 4.0 (which does not require any kernel patches, so easier to use). GPUDirect version 2.0 is for accelerating GPU communications between GPUs on the same server and on the same CPU chipset (it does not work if the GPUs are on separate CPU chipsets) and exists in CUDA 4.0. So if you use CUDA 4.0, you have both GPUDirect version 1 and GPUDirect version 2.

GPUDirect version 1 enables better communication between remote GPUs over InfiniBand. Why InfiniBand? Because you need to use RDMA for the data communications between the GPUs, else it does not work. Without RDMA support, you will require the server CPU to be involved in the data path, hence no much of “GPUDirect”... Looking into the InfiniBand vendors – the one that has RDMA support is Mellanox, and the one that does not (well, they have kind of software emulation of RDMA) is QLogic. No surprise why NVIDIA announced the GPUDirect project with Mellanox.

I have examined the performance of GPUDirect that both companies have published. I am sure that the cluster configurations were not identical, so one should expect some “noisy” performance comparison. When examining the performance results from both companies, I noticed that Mellanox used a single GPU per server, therefore their measurements clearly shows the benefit of GPUDirect version 1. I also noticed that QLogic used two GPUs per server, therefore their measurement could definitely cover GPUDirect version 2 (and not GPUDirect version 1), which has nothing to do with the network... Comparing between QLogic and Mellanox numbers has clearly no meaning knowing those setups details, and even when you do compare, the difference is negligible – meaning that with half the number of GPUs, Mellanox is doing almost the same as QLogic. I now know what I want to use in my GPU platform….  J

At the end of the day, if you are using GPUs, GPUDirect version 1 and version 2 can introduce performance benefits, just make sure you are using the right InfiniBand solution with version 1 and have the GPUs on the same CPU chipset to take advantage of version 2, and you will be all set.

Friday, July 1, 2011

My Take from the International Supercomputing Conference June 2011 (ISC’11)

I had the pleasure to be at the recent ISC’11 conference in Germany. ISC’11 is one of my preferred HPC conferences. Not too big to be over crowded but big enough to capture all the new development from the vendors in the high-performance computing world. And who can say no to pork shank and beer??

So what did I see at ISC’11?

1. The new number one system on the top 500 supercomputers list – the Fujitsu K-computer with 8 Petaflop of performance. Very nice achievement taking into consideration some of the issues that Japan are facing. On the other hand, Sparc and Tufo interconnect? Seems more as the Earth Simulator version 2 – nice demonstration of capability, no much usage beyond that.

2. Xyratex announced and demonstrated their new ClusterStor™ 3000 storage solution. Good to see that Xyratex is going to be a player in the HPC storage market. We definitely need to have more solutions that can deliver storage in the speed of HPC. Was also nice to see how Whamcloud helps to make Lustre a more friendly solution.

3. Intel demonstration of the MIC microprocessor – not really that new, but as accelerators become more and more important, MIC can play a good role in moving HPC to the next phase. NVIDIA and AMD will play here as well of course.

4. Mellanox demonstrating the new FDR InfiniBand 56Gb/s interconnect solutions. The InfiniBand progress of time is ready impressive. I remember the 10Gb/s IB in 2002, 20Gb/s in 2005, 40Gb/s in 2008 and now 56Gb/s in 2011. There was a multi-vendor InfiniBand network demonstration on the show floor organized by the HPC Advisory Council. Impressive launch. From the other InfiniBand provider – QLogic, I did not see much. When I asked, I heard some mumbling of 56Gb/s is too fast, and that they will close the gap with the next generation InfiniBand once it will be out. Wait a second, I thought that HPC is all about faster, stronger and better. FDR is too fast? Ha? maybe we should rename HPC to LPC (low-performance….)? Seems that QLogic is in the dark these days…. Myricom had a booth at ISC as well, but with no much to show, and not much to offer to the HPC world, it was a waste of money for them to be at ISC.

5. AMD demonstrated the new 16-Core Opteron "Interlagos" CPUs for the first time. Having 16 cores per socket, and with for example 4 socket machine, you can have a nice 64-core server….  My next buy into my HPC center.

And not less important, the evening events were great, in particular the T-Platform party. Thanks for the food and Vodka!

The World Top 10 Supercomputers (Based on the June 2011 TOP500 List)

Twice a year, a list of the world fastest supercomputers is being published (well… the list of the known supercomputers, there are many others that are not public…). The latest list was published on June 20th, 2011 on the TOP500 web site – www.top500.org.

In the June 2011 list (the 37th TOP500 List) for the first time, all of the top 10 supercomputers demonstrated more than a Petaflop of computing performance (more than a quadrillion calculations per second…). Here is my view on the top 10 systems (the truth, if you can handle the truth….).

Geography: USA with 5 systems, Europe (France) with 1 system, Japan with 2 systems and China with2 systems. Interestingly enough, in the top 5, one system is from the US, and 4 are from Asia. Is Asia taking the lead in HPC?  Or will we see an increase in funding to enhance the HPC development in the US?

Vendors: Fujitsu, NUDT, Cray (3 systems), Dawning, HP, SGI, Bull and IBM. Complete diversity.

CPUs: 5 systems use Intel, 3 systems use AMD, 1 system with Sparc and 1 system with Power. 80% use commodity x86 solutions. The new appearance of Sparc in the top 10 is due to the new system made by Fujitsu. Some folks see that as the second Earth simulator – a nice demonstration of capability, no much spread beyond that.

Accelerators: 5 systems use accelerators (GPGPU, Cell) to achieve the desired performance. Interesting trend due to the compute/dense/economical efficiency of the GPGPUs. My prediction is that more GPGPUs will be used (that is a safe bet…. ;-) ), and we will definitely find them as part of off-the-shelf CPUs in the not-to-far future.

Interconnect: 5 systems use Mellanox InfiniBand, 3 systems use Cray proprietary interconnect, 1 system uses NUDT proprietary interconnect and 1 system uses the Fujitsu Tofu proprietary interconnect. Looking in previous lists, InfiniBand as a standard has gain momentum in the top 10 systems. From 3 systems in the top 10, to 4 systems (last list) and to 5 systems in the current list. Win for the standard-base solutions is a win for all of us – most of the high-performance computing systems are based on standard solutions (there are many more than 500 HPC systems in the world you know….) therefore development around standard solution in the high-end of HPC platforms brings better capabilities and feature set to the rest of the HPC arena.

I prefer standard based solution over proprietary – better eco-system, better usage models and in a reach of everyone in need for HPC resources. Money spent in proprietary solutions is in my view just a big waste.

HPC-Opinion Introduction

I am one of those guys who live, eat and breath HPC (High-Performance-Computing of course). I have created HPC-Opinion to share with you my personal view on the interesting things, news and all the crazy stuff happening every day in the high-performance computing arena. I will try to analyze and put some light on things in debate, and bring you the truth… if you can handle the truth…  ;-)