Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12810

FileSystem#listLocatedStatus causes unnecessary RPC calls

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.2
    • Fix Version/s: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
    • Component/s: fs, fs/s3
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      FileSystem#listLocatedStatus lists the files in a directory and then calls getFileBlockLocations(stat.getPath(), ...) for each instead of getFileBlockLocations(stat, ...). That function with the path arg just calls getFileStatus to get another file status from the path and calls the file status version, so this ends up calling getFileStatus unnecessarily.

      This is particularly bad for S3, where getFileStatus is expensive. Avoiding the extra call improved input split calculation time for a data set in S3 by ~20x: from 10 minutes to 25 seconds.

        Issue Links

          Activity

          Hide
          rdblue Ryan Blue added a comment -

          Adding a patch that fixes the problem.

          Show
          rdblue Ryan Blue added a comment - Adding a patch that fixes the problem.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 9s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 8s trunk passed
          +1 compile 6m 43s trunk passed with JDK v1.8.0_72
          +1 compile 7m 24s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 1m 10s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 44s trunk passed
          +1 javadoc 1m 3s trunk passed with JDK v1.8.0_72
          +1 javadoc 1m 11s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 47s the patch passed
          +1 compile 8m 34s the patch passed with JDK v1.8.0_72
          +1 javac 8m 34s the patch passed
          +1 compile 7m 54s the patch passed with JDK v1.7.0_95
          +1 javac 7m 54s the patch passed
          +1 checkstyle 0m 25s the patch passed
          +1 mvnsite 1m 7s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 0s the patch passed
          +1 javadoc 1m 2s the patch passed with JDK v1.8.0_72
          +1 javadoc 1m 8s the patch passed with JDK v1.7.0_95
          -1 unit 17m 32s hadoop-common in the patch failed with JDK v1.8.0_72.
          +1 unit 8m 56s hadoop-common in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 24s Patch does not generate ASF License warnings.
          78m 25s



          Reason Tests
          JDK v1.8.0_72 Timed out junit tests org.apache.hadoop.http.TestHttpServerLogs



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788124/HADOOP-12810.1.patch
          JIRA Issue HADOOP-12810
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 1ac5921e9613 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 4b0e59f
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8635/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_72.txt
          unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8635/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_72.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8635/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8635/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 9s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 8s trunk passed +1 compile 6m 43s trunk passed with JDK v1.8.0_72 +1 compile 7m 24s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 23s trunk passed +1 mvnsite 1m 10s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 44s trunk passed +1 javadoc 1m 3s trunk passed with JDK v1.8.0_72 +1 javadoc 1m 11s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 47s the patch passed +1 compile 8m 34s the patch passed with JDK v1.8.0_72 +1 javac 8m 34s the patch passed +1 compile 7m 54s the patch passed with JDK v1.7.0_95 +1 javac 7m 54s the patch passed +1 checkstyle 0m 25s the patch passed +1 mvnsite 1m 7s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 0s the patch passed +1 javadoc 1m 2s the patch passed with JDK v1.8.0_72 +1 javadoc 1m 8s the patch passed with JDK v1.7.0_95 -1 unit 17m 32s hadoop-common in the patch failed with JDK v1.8.0_72. +1 unit 8m 56s hadoop-common in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 24s Patch does not generate ASF License warnings. 78m 25s Reason Tests JDK v1.8.0_72 Timed out junit tests org.apache.hadoop.http.TestHttpServerLogs Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788124/HADOOP-12810.1.patch JIRA Issue HADOOP-12810 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1ac5921e9613 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4b0e59f Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8635/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_72.txt unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8635/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_72.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8635/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8635/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          vinayrpet Vinayakumar B added a comment -

          +1,
          Though DFS will not have this issue, since getFileBlockLocations(Path p, long start, long len) is overridden, and actually it expects Just path itself. Not the stat.
          Will commit it shortly.

          Show
          vinayrpet Vinayakumar B added a comment - +1, Though DFS will not have this issue, since getFileBlockLocations(Path p, long start, long len) is overridden, and actually it expects Just path itself. Not the stat. Will commit it shortly.
          Hide
          vinayrpet Vinayakumar B added a comment -

          Committed to branch-2.7 and above.
          Thanks Ryan Blue.

          Show
          vinayrpet Vinayakumar B added a comment - Committed to branch-2.7 and above. Thanks Ryan Blue .
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9309 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9309/)
          HADOOP-12810. FileSystem#listLocatedStatus causes unnecessary RPC calls (vinayakumarb: rev 96ea3094315bb1e1a5e268e3817c7fdedc3e9462)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9309 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9309/ ) HADOOP-12810 . FileSystem#listLocatedStatus causes unnecessary RPC calls (vinayakumarb: rev 96ea3094315bb1e1a5e268e3817c7fdedc3e9462) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          Hide
          rdblue Ryan Blue added a comment -

          Thanks for reviewing and commiting this so quickly, Vinayakumar B!

          Show
          rdblue Ryan Blue added a comment - Thanks for reviewing and commiting this so quickly, Vinayakumar B !
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Following testcase is failing after this jira in.

          FAILED:  org.apache.hadoop.mapred.TestFileInputFormat.testSplitLocationInfo[0]
          
          Error Message:
          expected:<2> but was:<1>
          
          Stack Trace:
          java.lang.AssertionError: expected:<2> but was:<1>
          	at org.junit.Assert.fail(Assert.java:88)
          	at org.junit.Assert.failNotEquals(Assert.java:743)
          	at org.junit.Assert.assertEquals(Assert.java:118)
          	at org.junit.Assert.assertEquals(Assert.java:555)
          	at org.junit.Assert.assertEquals(Assert.java:542)
          	at org.apache.hadoop.mapred.TestFileInputFormat.testSplitLocationInfo(TestFileInputFormat.java:115)
          
          Show
          brahmareddy Brahma Reddy Battula added a comment - Following testcase is failing after this jira in. FAILED: org.apache.hadoop.mapred.TestFileInputFormat.testSplitLocationInfo[0] Error Message: expected:<2> but was:<1> Stack Trace: java.lang.AssertionError: expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.TestFileInputFormat.testSplitLocationInfo(TestFileInputFormat.java:115)
          Hide
          vinayrpet Vinayakumar B added a comment -

          Thanks Brahma Reddy Battula for pointing out.
          I see that MockFileSystem overriding public BlockLocation[] getFileBlockLocations(Path p, long start, long len) throws IOException {. Changing it to public BlockLocation[] getFileBlockLocations(Filestatus file, long start, long len) throws IOException { will resolve the issue.

          Show
          vinayrpet Vinayakumar B added a comment - Thanks Brahma Reddy Battula for pointing out. I see that MockFileSystem overriding public BlockLocation[] getFileBlockLocations(Path p, long start, long len) throws IOException { . Changing it to public BlockLocation[] getFileBlockLocations(Filestatus file, long start, long len) throws IOException { will resolve the issue.
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Yes, Raised issue for same ( MAPREDUCE-6637).

          Show
          brahmareddy Brahma Reddy Battula added a comment - Yes, Raised issue for same ( MAPREDUCE-6637 ).
          Hide
          ctrezzo Chris Trezzo added a comment -

          Adding 2.6.5 to the target versions with the intention of backporting this to branch-2.6. We would also backport the associated MAPREDUCE-6637 for the test fix. Please let me know if you think otherwise. Thanks!

          Show
          ctrezzo Chris Trezzo added a comment - Adding 2.6.5 to the target versions with the intention of backporting this to branch-2.6. We would also backport the associated MAPREDUCE-6637 for the test fix. Please let me know if you think otherwise. Thanks!
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Closing the JIRA as part of 2.7.3 release.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.
          Hide
          sjlee0 Sangjin Lee added a comment -

          Cherry-picked it to 2.6.5 (trivial). I'll also get MAPREDUCE-6637.

          Show
          sjlee0 Sangjin Lee added a comment - Cherry-picked it to 2.6.5 (trivial). I'll also get MAPREDUCE-6637 .
          Hide
          ctrezzo Chris Trezzo added a comment -

          Thanks!

          Show
          ctrezzo Chris Trezzo added a comment - Thanks!

            People

            • Assignee:
              rdblue Ryan Blue
              Reporter:
              rdblue Ryan Blue
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development