In the past I reviewed two clusters topologies - Fat Tree
(CLOS) and Torus. The dragonfly is a hierarchical topology with the following
properties: several groups are connected together using all to all links (i.e.
each group has at least one link directly to each other group), the topology
inside each group can be any topology, it requires non-minimal global adaptive
routing and advanced congestion look ahead for efficient operation. Simply
saying, a dragonfly topology is a two level (at least) topology where at the
top level groups of switches are connected in a full graph. The internal
structure of the groups may vary and be constructed as full graph, fat tree,
torus, mesh, dragonfly and so on.
While many described dragonfly as a topology with one hop
between groups, actually it is not correct and in many cases the network
traffic will go over several hops before getting to the destination. The key to
make dragonfly topology effective is to allow some pairs of end-nodes to
communicate on a non-minimal route. It is the only way to distribute random
group to group traffic. This represents a significant difference from other
topologies. To support such routing a dragonfly system needs to utilize adaptive
routing and to only send traffic on the longer paths only if congestion is
impacting the minimal (hops) paths.
While dragonfly goal was to enable a higher bisectional bandwidth
compared to torus topology (or similar) while reducing the overall costs (mainly
cable lengths), in practice dragonfly does not provide such an advantage - nor
performance or costs over fat tree topology for example. On the other hand it
does add some sort of complexity with the need to enable adaptive routing. So
far we prefer to use the fat tree option – either with a full bisectional bandwidth
configuration or with some oversubscription options – depends on the targeted
applications.
No comments:
Post a Comment