Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: harchive
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Each HAR file system has two index files that contains information on how files are stored in the part files. During the block location calculation, these indexes are reread for every file in the archive. Caching the indexes and the status of the part files will greatly reduce the number of name node operations during the job setup time.

      1. MAPREDUCE-2459.2.patch
        16 kB
        Mac Yang
      2. MAPREDUCE-2459.1.patch
        15 kB
        Mac Yang

        Issue Links

          Activity

          Mac Yang created issue -
          Mac Yang made changes -
          Field Original Value New Value
          Attachment MAPREDUCE-2459.1.patch [ 12477799 ]
          Mac Yang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12477799/MAPREDUCE-2459.1.patch
          against trunk revision 1097679.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.cli.TestMRCLI
          org.apache.hadoop.tools.TestHadoopArchives
          org.apache.hadoop.tools.TestHarFileSystem

          -1 contrib tests. The patch failed contrib unit tests.

          -1 system test framework. The patch failed system test framework compile.

          Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/199//testReport/
          Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/199//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/199//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12477799/MAPREDUCE-2459.1.patch against trunk revision 1097679. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI org.apache.hadoop.tools.TestHadoopArchives org.apache.hadoop.tools.TestHarFileSystem -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/199//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/199//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/199//console This message is automatically generated.
          Hide
          Mahadev konar added a comment -

          Mac, looks like the tests are failing (especially TestHarFileSystem). The patch looks good to me. Is there any particular reason on using an _ in front of the following variables?

          _harMetaCache
          

          Also, this is meant for trunk only?

          Show
          Mahadev konar added a comment - Mac, looks like the tests are failing (especially TestHarFileSystem). The patch looks good to me. Is there any particular reason on using an _ in front of the following variables? _harMetaCache Also, this is meant for trunk only?
          Mahadev konar made changes -
          Affects Version/s 0.23.0 [ 12315570 ]
          Mahadev konar made changes -
          Affects Version/s 0.23.0 [ 12315570 ]
          Fix Version/s 0.23.0 [ 12315570 ]
          Mac Yang made changes -
          Attachment MAPREDUCE-2459.2.patch [ 12479166 ]
          Mac Yang made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Mac Yang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Mac Yang added a comment -

          Mahadev, thanks for the feedback, I have updated the patch to include the following changes,

          • Removed '_' from harMetaCache
          • Added modification time stamp check and reparse the index files if necessary. This is to address the case where the archive is overwritten in between two reads from the same process
          Show
          Mac Yang added a comment - Mahadev, thanks for the feedback, I have updated the patch to include the following changes, Removed '_' from harMetaCache Added modification time stamp check and reparse the index files if necessary. This is to address the case where the archive is overwritten in between two reads from the same process
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12479166/MAPREDUCE-2459.2.patch
          against trunk revision 1102515.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/245//testReport/
          Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/245//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/245//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479166/MAPREDUCE-2459.2.patch against trunk revision 1102515. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/245//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/245//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/245//console This message is automatically generated.
          Hide
          Mahadev konar added a comment -

          +1 lgtm. Ill commit it to trunk.

          Show
          Mahadev konar added a comment - +1 lgtm. Ill commit it to trunk.
          Hide
          Mahadev konar added a comment -

          I just committed this to trunk. Thanks mac!

          Show
          Mahadev konar added a comment - I just committed this to trunk. Thanks mac!
          Mahadev konar made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #690 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/690/)
          MAPREDUCE-2459. Cache HAR filesystem metadata. (Mac Yang via mahadev)

          mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125428
          Files :

          • /hadoop/mapreduce/trunk/CHANGES.txt
          • /hadoop/mapreduce/trunk/src/tools/org/apache/hadoop/fs/HarFileSystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #690 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/690/ ) MAPREDUCE-2459 . Cache HAR filesystem metadata. (Mac Yang via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125428 Files : /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/tools/org/apache/hadoop/fs/HarFileSystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #686 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/686/)
          MAPREDUCE-2459. Cache HAR filesystem metadata. (Mac Yang via mahadev)

          mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125428
          Files :

          • /hadoop/mapreduce/trunk/CHANGES.txt
          • /hadoop/mapreduce/trunk/src/tools/org/apache/hadoop/fs/HarFileSystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #686 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/686/ ) MAPREDUCE-2459 . Cache HAR filesystem metadata. (Mac Yang via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125428 Files : /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/tools/org/apache/hadoop/fs/HarFileSystem.java
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Jason Lowe made changes -
          Link This issue is duplicated by MAPREDUCE-865 [ MAPREDUCE-865 ]
          Jason Lowe made changes -
          Link This issue is related to HADOOP-9757 [ HADOOP-9757 ]

            People

            • Assignee:
              Mac Yang
              Reporter:
              Mac Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development