[HADOOP-17408] Optimize NetworkTopology while sorting of block locations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.3.1, 3.4.0
Component/s: common, net
Labels:
- pull-request-available

Description

In NetworkTopology, I noticed that there are some hanging fruits to improve the performance.

Inside sortByDistance, collections.shuffle is performed on the list before calling secondarySort.

Collections.shuffle(list, r);
if (secondarySort != null) {
  secondarySort.accept(list);
}

However, in different call sites, collections.shuffle is passed as the secondarySort to sortByDistance. This means that the shuffle is executed twice on each list.
Also, logic wise, it is useless to shuffle before applying a tie breaker which might make the shuffle work obsolete.

In addition, daryn reported that:

topology is unnecessarily locking/unlocking to calculate the distance for every node
shuffling uses a seeded Random, instead of ThreadLocalRandom, which is heavily synchronized

Attachments

Issue Links

links to

GitHub Pull Request #2601

Activity

People

Assignee:: Ahmed Hussein

Reporter:: Ahmed Hussein

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Dec/20 15:29

Updated:: 10/Jan/21 04:23

Resolved:: 08/Jan/21 20:03

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

2h 10m