Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15937

Reduce memory used during datanode layout upgrade

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      When the datanode block layout is upgrade from -56 (256x256) to -57 (32x32), we have found the datanode uses a lot more memory than usual.

      For each volume, the blocks are scanned and a list is created holding a series of LinkArgs objects. This object contains a File object for the block source and destination. The file object stores the path as a string, eg:

      /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825_1001.meta
      /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825

      This is string is repeated for every block and meta file on the DN, and much of the string is the same each time, leading to a large amount of memory.

      If we change the linkArgs to store:

      • Src Path without the block, eg /data01/dfs/dn/previous.tmp/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0
      • Dest Path without the block eg /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir10
      • Block / Meta file name, eg blk_12345678_1001 or blk_12345678_1001.meta

      Then ensure were reuse the same file object for repeated src and dest paths, we can save most of the memory without reworking the logic of the code.

      The current logic works along the source paths recursively, so you can easily re-use the src path object.

      For the destination path, there are only 32x32 (1024) distinct paths, so we can simply cache them in a hashMap and lookup the re-useable object each time.

      I tested locally by generating 100k block files and attempting the layout upgrade. A heap dump showed the 100k blocks using about 140MB of heap. That is close to 1.5GB per 1M blocks.

      After the change outlined above the same 100K blocks used about 20MB of heap, so 200MB per million blocks.

      A general DN sizing recommendation is 1GB of heap per 1M blocks, so the upgrade should be able to happen within the pre-upgrade heap.

      Attachments

        1. heap-dump-after.png
          75 kB
          Stephen O'Donnell
        2. heap-dump-before.png
          139 kB
          Stephen O'Donnell

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              sodonnell Stephen O'Donnell
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h