Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11661

GetContentSummary uses excessive amounts of memory

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.8.0, 3.0.0-alpha2
    • Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Reverted HDFS-10797 to fix a scalability regression brought by the commit.
      Show
      Reverted HDFS-10797 to fix a scalability regression brought by the commit.

      Description

      ContentSummaryComputationContext::nodeIncluded() is being used to keep track of all INodes visited during the current content summary calculation. This can be all of the INodes in the filesystem, making for a VERY large hash table. This simply won't work on large filesystems.

      We noticed this after upgrading a namenode with ~100Million filesystem objects was spending significantly more time in GC. Fortunately this system had some memory breathing room, other clusters we have will not run with this additional demand on memory.

      This was added as part of HDFS-10797 as a way of keeping track of INodes that have already been accounted for - to avoid double counting.

        Attachments

        1. HDFS-11661.001.patch
          10 kB
          Wei-Chiu Chuang
        2. HDFs-11661.002.patch
          10 kB
          Wei-Chiu Chuang
        3. Heap growth.png
          184 kB
          Daryn Sharp

          Issue Links

            Activity

              People

              • Assignee:
                jojochuang Wei-Chiu Chuang
                Reporter:
                nroberts Nathan Roberts
              • Votes:
                0 Vote for this issue
                Watchers:
                19 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: