Tuesday, March 22, 2016

Co-Processors - Intel Knights Landing

In the past I wrote on NVIDIA GPUs, and on the GPU-Direct technology that enables a direct network communication between the NVIDIA GPU and the cluster interconnect (for example InfiniBand). Since then, there were several enhancement to the GPU-Direct technology, and I will try to cover that in future posts.

In this post I would like to give my view on the upcoming Intel Knights Landing Xeon Phi co-processor. Knights Landing is the next generation Xeon Phi after Knight Corner. Knights Corner was not a success story to say the least, and Intel aims to gain some attraction with the next generation Knight Landing.

Knights Landing has 72 cores, and more important it is a bootable device, so one can use Knights Landing as the main CPU in a server platform. This is a nice capability, as one can build/use a single CPU (Knight Landing) boards for example.

On the connectivity side, Knights Landing will be provided in two versions (packages) - KNL and KNL-F. KNL is the Knights Landing CPU, with two PCI-Express x16 and one PCI-Express x4 interfaces. The two x16 can be connected to any network solution. KNL-F is a packaged Knights Landing with two Intel Omni-Path ASICs, connected via PCI-Express x16 each to the KNL ASIC. There will be PCI-Express x4 connectivity out of the KNL-F package for management options.

KNL has being tested with InfiniBand and works great using the Open Fabrics OFED distribution, or the Mellanox OFED distribution. It has being implemented in several sites already.

KNL-F usage is questionable. It makes sense to Intel to try and lock users to their propriety interconnect product (which is no more than a new version of the QLogic TrueScale product), but why would one want to be locked down to OmniPath? Especially as Omni-Path requires the KNL to spend many expensive cores cycles to manage and operate the Omni-Path network (which means loss of KNL performance)?  TrueScale was not used with GPUs in the past due to its overhead, and Omni-Path, which is based on the same TrueScale architecture, is no better.

