Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13752

fs.Path stores file path in java.net.URI causes big memory waste

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.7.6
    • None
    • fs
    • None
    • Hive 2.1.1 and hadoop 2.7.6

    Description

      I was looking at HiveServer2 memory usage, and a big percentage of this was because of org.apache.hadoop.fs.Path, where you store file paths in a java.net.URI object. The URI implementation stores the same string in 3 different objects (see the attached image). In Hive when there are many partitions this cause a big memory usage. In my particular case 42% of memory was used by java.net.URI so it could be reduced to 14%. 

      I wonder if the community is open to replace it with a more memory efficient implementation and what other things should be considered here? It can be a huge memory improvement for Hadoop and for Hive as well.

      Attachments

        1. Screen Shot 2018-07-20 at 11.12.38.png
          150 kB
          Barnabas Maidics
        2. heapdump-100000partitions.html
          2.02 MB
          Misha Dmitriev
        3. measurement.pdf
          318 kB
          Barnabas Maidics
        4. HDFS-13752.001.patch
          11 kB
          Barnabas Maidics
        5. HDFS-13752.002.patch
          10 kB
          Barnabas Maidics
        6. HDFS-13752.003.patch
          13 kB
          Barnabas Maidics
        7. HDFSbenchmark.pdf
          833 kB
          Barnabas Maidics

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            b.maidics Barnabas Maidics Assign to me
            b.maidics Barnabas Maidics

            Dates

              Created:
              Updated:

              Slack

                Issue deployment