Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12922

Arrays of length 1 cause 9.2% memory overhead

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I recently obtained a big (over 60GiB) heap dump from a customer and analyzed it using jxray (www.jxray.com). One source of memory waste that the tool detected is arrays of length 1 that come from BlockInfo[] org.apache.hadoop.hdfs.server.namenode.INodeFile.blocks and INode$Feature[] org.apache.hadoop.hdfs.server.namenode.INodeFile.features. Only a small fraction of these arrays (less than 10%) have a length greater than 1. Collectively these arrays waste 5.5GiB, or 9.2% of the heap. See the attached screenshot for more details.

      The reason why an array of length 1 is problematic is that every array in the JVM has a header, that takes between 16 and 20 bytes depending on the JVM configuration. For a big enough array this 16-20 byte overhead is not a concern, but if the array has only one element (that takes 4-8 bytes depending on the JVM configuration), the overhead becomes bigger than the array's "workload".

      In such a situation it makes sense to replace the array data field Foo[] ar with an Object obj, that would contain either a direct reference to the array's single workload element, or a reference to the array if there is more than one element. This change will require further code changes and type casts. For example, code like return ar[i]; becomes return (obj instanceof Foo) ? (Foo) obj : ((Foo[]) obj)[i]; and so on. This doesn't look very pretty, but as far as I see, the code that deals with e.g. INodeFile.blocks already contains various null checks, etc. So we will not make the code much less readable.

        Attachments

        1. screenshot-1.png
          261 kB
          Misha Dmitriev

          Activity

            People

            • Assignee:
              misha@cloudera.com Misha Dmitriev
              Reporter:
              misha@cloudera.com Misha Dmitriev
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated: