Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7241

FileInputFormat listStatus with less memory footprint

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.6.1
    • 3.4.0
    • job submission
    • None
    • Reviewed

    Description

      This case sometimes sees in hive when user issues queries over all partitions by mistakes. The file status cached when listing status could accumulate to over 3g.  After digging into the  dumped memory, the LocatedBlock occupies about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as shows followed,

      Right now we only extract the block locations info from LocatedFileStatus,  the datanode infos(types) or block token are not taken into account. So there is no need to cache LocatedBlock, as do like this:

      BlockLocation[] blockLocations = dedup(stat.getBlockLocations());
      LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations);

      private static BlockLocation[] dup(BlockLocation[] blockLocations) {
          BlockLocation[] copyLocs = new BlockLocation[blockLocations.length];
          int i = 0;
          for (BlockLocation location : blockLocations)

      {         copyLocs[i++] = new BlockLocation(location);     }

          return copyLocs;
      }

       

      Attachments

        1. filestatus.png
          110 kB
          Zhihua Deng
        2. MAPREDUCE-7241.trunk.patch
          5 kB
          Zhihua Deng
        3. MAPREDUCE-7241.trunk.02.patch
          4 kB
          Zhihua Deng
        4. MAPREDUCE-7241.03.patch
          7 kB
          Zhihua Deng
        5. MAPREDUCE-7241.04.patch
          7 kB
          Zhihua Deng
        6. MAPREDUCE-7241.05.patch
          10 kB
          Zhihua Deng
        7. MAPREDUCE-7241.06.patch
          11 kB
          Zhihua Deng

        Issue Links

          Activity

            People

              dengzh Zhihua Deng
              dengzh Zhihua Deng
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: