Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1374

Reduce memory footprint of FileSplit

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.20.1, 0.21.0, 0.22.0
    • None
    • None
    • None

    Description

      We can have many FileInput objects in the memory, depending on the number of mappers.

      It will save tons of memory on JobTracker and JobClient if we intern those Strings for host names.

      FileInputFormat.java:
      
            for (NodeInfo host: hostList) {
              // Strip out the port number from the host name
      -        retVal[index++] = host.node.getName().split(":")[0];
      +        retVal[index++] = host.node.getName().split(":")[0].intern();
              if (index == replicationFactor) {
                done = true;
                break;
              }
            }
      

      More on String.intern(): http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html

      It will also save a lot of memory by changing the class of file from Path to String. Path contains a java.net.URI which internally contains ~10 String fields. This will also be a huge saving.

        private Path file;
      

      Attachments

        1. MAPREDUCE-1374.1.patch
          2 kB
          Zheng Shao
        2. MAPREDUCE-1374.2.patch
          4 kB
          Zheng Shao
        3. MAPREDUCE-1374.3.patch
          4 kB
          Zheng Shao

        Activity

          People

            zshao Zheng Shao
            zshao Zheng Shao
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: