[HDDS-4970] Significant overhead when DataNode is over-subscribed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.3.0
Component/s: Ozone Datanode
Labels:
- pull-request-available

Description

Ran a microbenchmark to have concurrent clients reading chunks from a DataNode.

When the number of clients grows, there is a significant amount of overhead in accessing a concurrent hash map. The overhead grows exponentially with respect to the number of clients.

ChunkUtils#processFileExclusively

  @VisibleForTesting
  static <T> T processFileExclusively(Path path, Supplier<T> op) {
    for (;;) {
      if (LOCKS.add(path)) {
        break;
      }
    }

    try {
      return op.get();
    } finally {
      LOCKS.remove(path);
    }
  }

In my test, having 64 concurrent clients reading chunks from a 1-disk DataNode caused the DN to spend nearly half of the time adding into the LOCKS object (a concurrent hash map).

Given that it is not uncommon to find HDFS DataNodes with tens of thousands of incoming client connections, I expect to see similar traffic to an Ozone DataNode at scale.

We should fix this code.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ozone_dn-rhel08.ozone.local.html
03/Aug/22 20:25
2.29 MB
Wei-Chiu Chuang
Screen Shot 2021-03-11 at 11.58.23 PM.png
12/Mar/21 01:25
624 kB
Wei-Chiu Chuang
Screen Shot 2022-08-02 at 5.43.15 PM.png
03/Aug/22 17:35
516 kB
Wei-Chiu Chuang
Screen Shot 2022-08-02 at 5.45.34 PM.png
03/Aug/22 17:35
264 kB
Wei-Chiu Chuang

Issue Links

is broken by

HDDS-2026 Overlapping chunk region cannot be read concurrently

Resolved

links to

GitHub Pull Request #3654

Activity

People

Assignee:: Wei-Chiu Chuang

Reporter:: Wei-Chiu Chuang

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 12/Mar/21 01:29

Updated:: 05/Aug/22 06:49

Resolved:: 05/Aug/22 06:49