Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-9
Description
When conducting large scale experiments on a 6 rack cluster with aggregator core network topology overall cluster bandwidth utilization was limited.
With aggregator core networks nodes and racks are not equidistant, which means a broadcast operation can be inefficient as the broadcasting node needs to send the same data N times to each node on a remote rack.
Ideally Rowbatches should be sent once per remote rack then a node on each remote rack would broadcast within its rack.
Table below represent rack to rack latency for the 90% of operations, ration between best and worst case is 7.3x
va | vc | vd1 | vd3 | ve | |
---|---|---|---|---|---|
va | 4,238 | 4,290 | 9,692 | 8,897 | 8,208 |
vc | 9,290 | 4,396 | 30,952 | 13,529 | 14,578 |
vd1 | 9,131 | 29,066 | 4,346 | 17,265 | 16,849 |
vd3 | 7,409 | 15,517 | 17,265 | 4,370 | 4,687 |
ve | 4,914 | 16,894 | 16,430 | 4,713 | 4,472 |
Attachments
Issue Links
- is related to
-
IMPALA-2424 Rack-aware scheduling
- Open
-
IMPALA-6196 Track per-remote-host RPC metrics
- Resolved