Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8859

Improve DataNode ReplicaMap memory footprint to save about 45%

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      By using following approach we can save about 45% memory footprint for each block replica in DataNode memory (This JIRA only talks about ReplicaMap in DataNode), the details are:

      In ReplicaMap,

      private final Map<String, Map<Long, ReplicaInfo>> map =
          new HashMap<String, Map<Long, ReplicaInfo>>();
      

      Currently we use a HashMap Map<Long, ReplicaInfo> to store the replicas in memory. The key is block id of the block replica which is already included in ReplicaInfo, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to LightWeightGSet, but not a fixed size (LightWeightGSet uses fix size for the entries array, usually it's a big value, an example is BlocksMap, this can avoid full gc since no need to resize), also we should be able to get Element through key.

      Following is comparison of memory footprint If we implement a lightweight set as described:

      We can save:

          SIZE (bytes)           ITEM
          20                        The Key: Long (12 bytes object overhead + 8 bytes long)
          12                        HashMap Entry object overhead
          4                          reference to the key in Entry
          4                          reference to the value in Entry
          4                          hash in Entry
      

      Total: -44 bytes

      We need to add:

          SIZE (bytes)           ITEM
          4                             a reference to next element in ReplicaInfo
      

      Total: +4 bytes

      So totally we can save 40bytes for each block replica

      And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here).

      We can save 1 - (4 + 46) / (44 + 46) = 45% memory for each block replica in DataNode.

        Attachments

        1. HDFS-8859.006.patch
          34 kB
          Yi Liu
        2. HDFS-8859.005.patch
          34 kB
          Yi Liu
        3. HDFS-8859.004.patch
          34 kB
          Yi Liu
        4. HDFS-8859.003.patch
          30 kB
          Yi Liu
        5. HDFS-8859.002.patch
          27 kB
          Yi Liu
        6. HDFS-8859.001.patch
          26 kB
          Yi Liu

          Activity

            People

            • Assignee:
              hitliuyi Yi Liu
              Reporter:
              hitliuyi Yi Liu
            • Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: