Monday, July 11, 2011

ISC’11 Hot Seat Sessions – InfiniBand versus InfiniBand

Thanks to Rich Brueckner, we could see a recap of ISC’11 Hot Seat sessions of Mellanox’s CTO Michael Kagan and QLogic’s VP of engineering Philip Murphy on

First let me start with the final part of the Hot Seat sessions – the questions. Mr. Jack Dongarra decided to ask some weird questions to say the least. His question to Mr. Kagan was to comment on a recent system that Mellanox has lost.  Dah?? Does someone expect any vendor to win it all? Intel sometimes loses to AMD, IBM to HP, HP to SGI and so on. Instead of asking Mellanox, the leader in InfiniBand, on future InfiniBand development for example, Mr. Dongarra selected an embarrassing question, embarrassing not to Mellanox, but to himself. His question to Mr.  Murphy was about InfiniBand being a player in the Exascale, saying that he does not believe in that option. Really? Do you have other proven options? InfiniBand was in the top 10 of the TOP500 since 2003, and I am using it since 2005. It seems to me that either Mr. Dongarra deals with too much politics, or maybe he does not know what to ask. In any case, it might be better to find another examiner for next year.

Second, it was fascinating to compare between Mr. Kagan and Mr. Murphy. I had the feeling that I see the case of a leader versus a follower. Mellanox is clearly the leader in the InfiniBand market, and the first one to offer FDR InfiniBand (same as in the QDR case few years ago, where they have offered QDR a year before QLogic had anything to suggest). Michael mentioned the big FDR win at LRZ - a 3 Petaflop system (probably around 15,000 nodes in a single system). Mr. Kagan also viewed his vision for the future, and did not mentioned their competitors. On the other hand Mr. Murphy has some mistakes in my eyes. First he mumbled about the future of InfiniBand when he was asked about InfiniBand for the Exascale. Instead of being confident to be able to deliver a scalable solution, he talked about the conditions for InfiniBand to be in the Exascale – it will be there if its performance will continue to improve. Improved by whom? By Mellanox? Another item was when he talked about the LLNL recent system announcement.  At the beginning he mentioned up to 20,000 nodes and then he stammered and mentioned that the system will be built as 150 server islands. So it is not a single, large scale system? In this case, why the bragging on scaling? First show a system on the top 10 of the TOP500 to get some credibility Mr. Murphy.

And last the proprietary angle of things. One of the things I like in InfiniBand is that it is a standard interconnect solution. A standard physical layer, standard link layer, standard transport layer and a standard software API (verbs). I am using InfiniBand for long time already and enjoying its scalability and the large eco-system behind it. I did not like the proprietary software APIs, such as QsNet from Quadrics and MX from Myricom. PSM from QLogic is the same bad thing. I would recommend focusing on a standard solution, instead of creating a proprietary interface that is there to hide hardware limitations. By the way, InfiniBand is in the top 10 of the TOP500 since 2003, so please don’t tell us that InfiniBand was designed for non HPC system Mr. Murphy, and please don’t use this excuse to defend pushing proprietary over standard.

No comments:

Post a Comment