Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-4970

Significant overhead when DataNode is over-subscribed

    XMLWordPrintableJSON

Details

    Description

      Ran a microbenchmark to have concurrent clients reading chunks from a DataNode.

      When the number of clients grows, there is a significant amount of overhead in accessing a concurrent hash map. The overhead grows exponentially with respect to the number of clients.

      ChunkUtils#processFileExclusively
        @VisibleForTesting
        static <T> T processFileExclusively(Path path, Supplier<T> op) {
          for (;;) {
            if (LOCKS.add(path)) {
              break;
            }
          }
      
          try {
            return op.get();
          } finally {
            LOCKS.remove(path);
          }
        }
      

      In my test, having 64 concurrent clients reading chunks from a 1-disk DataNode caused the DN to spend nearly half of the time adding into the LOCKS object (a concurrent hash map).

       

       

      Given that it is not uncommon to find HDFS DataNodes with tens of thousands of incoming client connections, I expect to see similar traffic to an Ozone DataNode at scale.

      We should fix this code.

      Attachments

        1. ozone_dn-rhel08.ozone.local.html
          2.29 MB
          Wei-Chiu Chuang
        2. Screen Shot 2021-03-11 at 11.58.23 PM.png
          624 kB
          Wei-Chiu Chuang
        3. Screen Shot 2022-08-02 at 5.43.15 PM.png
          516 kB
          Wei-Chiu Chuang
        4. Screen Shot 2022-08-02 at 5.45.34 PM.png
          264 kB
          Wei-Chiu Chuang

        Issue Links

          Activity

            People

              weichiu Wei-Chiu Chuang
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: