iWARP stands for Internet Wide Area RDMA Protocol and it is an Internet Engineering Task Force (IETF) update of the RDMA Consortium’s RDMA legacy TCP. The main component of the iWARP protocol is the Data Direct Protocol (DDP), which permits zero-copy transmission. DDP itself does not perform the transmission, TCP does. However, TCP does not respect message boundaries and it sends data as a sequence of bytes without regard to protocol data units (PDU). In this regard, DDP itself may be better suited for SCTP, and indeed a focus of the IETF has been to standardize RDMA over SCTP. Running DDP over TCP requires a tweak known as marker PDU aligned (MPA) framing so as to guarantee boundaries of messages. Furthermore, DDP is not intended be to be accessed directly. Instead, a separate RDMA protocol (RDMAP) provides the services to read and write data. Therefore, the entire RDMA over TCP specification means RDMAP over DDP over MPA over TCP. Complex enough? absolutely…
iWARP does not have a standard programming interface, and instead, only has a single communication protocol option. This is the only service that TCP and SCTP can provide, and thus, lacks some of the feature that other RDMA solutions provide such as atomic operations. Since the kernel implementation of the TCP stack is a tremendous bottleneck, few vendors have tried to implement TCP in the networking hardware. In some cases, the error-correction (a mechanism of TCP) is still performed by the software while the more frequently performed communications are handled by logic embedded on the NIC. This additional hardware is known as the TCP offload engine (TOE). TOE itself does not prevent copying on the receive side of course, and must be combined with RDMA technology for zero-copy. Complex enough? absolutely…
The main vendors (if not only) that offer iWARP solutions today are Intel (through the acquisition of NetEffect) and Chelsio Communications. There is a reason why “Internet” is part of the iWARP name… due to its complexity iWARP does not provide any performance benefits for HPC workloads. Latency is higher, bandwidth is the same as any other Ethernet solution, and the high CPU overhead is still there. I am better running directly over TCP instead of running my job over RDMAP over DDP over MPA over TCP…
No comments:
Post a Comment