Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5969

Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • mrv1

    Description

      Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by "-files" command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G.
      I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java.
      I use the following command to test:
      hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out
      to add two files into distributed cache:WordCount.java and wordcount.jar.
      WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260.
      The log show these files size added twice:
      add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2:
      addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
      addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
      addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
      In the code, for Private non-Archive File, the first time we add file size is at
      getLocalCache:

                  if (!isArchive) {
                    //for private archives, the lengths come over RPC from the 
                    //JobLocalizer since the JobLocalizer is the one who expands
                    //archives and gets the total length
                    lcacheStatus.size = fileStatus.getLen();
      
                    LOG.info("getLocalCache:" + localizedPath + " size = "
                        + lcacheStatus.size);
                    // Increase the size and sub directory count of the cache
                    // from baseDirSize and baseDirNumberSubDir.
                    baseDirManager.addCacheInfoUpdate(lcacheStatus);
                  }
      

      The second time we add file size is at
      setSize:

            synchronized (status) {
              status.size = size;
              baseDirManager.addCacheInfoUpdate(status);
            }
      

      The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject).

      Attachments

        1. MAPREDUCE-5969.branch1.1.patch
          5 kB
          Zhihai Xu
        2. MAPREDUCE-5969.branch1.patch
          5 kB
          Zhihai Xu

        Activity

          People

            zxu Zhihai Xu
            zxu Zhihai Xu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: