Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12922

Arrays of length 1 cause 9.2% memory overhead

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None


      I recently obtained a big (over 60GiB) heap dump from a customer and analyzed it using jxray (www.jxray.com). One source of memory waste that the tool detected is arrays of length 1 that come from BlockInfo[] org.apache.hadoop.hdfs.server.namenode.INodeFile.blocks and INode$Feature[] org.apache.hadoop.hdfs.server.namenode.INodeFile.features. Only a small fraction of these arrays (less than 10%) have a length greater than 1. Collectively these arrays waste 5.5GiB, or 9.2% of the heap. See the attached screenshot for more details.

      The reason why an array of length 1 is problematic is that every array in the JVM has a header, that takes between 16 and 20 bytes depending on the JVM configuration. For a big enough array this 16-20 byte overhead is not a concern, but if the array has only one element (that takes 4-8 bytes depending on the JVM configuration), the overhead becomes bigger than the array's "workload".

      In such a situation it makes sense to replace the array data field Foo[] ar with an Object obj, that would contain either a direct reference to the array's single workload element, or a reference to the array if there is more than one element. This change will require further code changes and type casts. For example, code like return ar[i]; becomes return (obj instanceof Foo) ? (Foo) obj : ((Foo[]) obj)[i]; and so on. This doesn't look very pretty, but as far as I see, the code that deals with e.g. INodeFile.blocks already contains various null checks, etc. So we will not make the code much less readable.



          This comment will be Viewable by All Users Viewable by All Users


            misha@cloudera.com Misha Dmitriev Assign to me
            misha@cloudera.com Misha Dmitriev




                Issue deployment