Hadoop Common
  1. Hadoop Common
  2. HADOOP-4717

Removal of default port# in NameNode.getUri() cause a map/reduce job failed to prompt temporay output

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.0
    • Fix Version/s: 0.18.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Problem reported here is that when the default port number (8020) is specified in the output, job succeeds but no output is created. The cause of the problem is that "listStatus" call drops the port number because NameNode.getUri removes the default port#.

      Assuming that a map/reduce output directory is set to be "hdfs://localhost:8020/out", A call "listStatus" on any of its sub directory, for example, "hdfs://localhost:8020/out/tempXX", returns results like below:

      hdfs://localhost/out/tempXX/part-00005

      Because of this, Task.java
      574 private Path getFinalPath(Path jobOutputDir, Path taskOutput) {
      575 URI relativePath = taskOutputPath.toUri().relativize(taskOutput.toUri());

      does not get the correct relativePath because TaskOutputPath contain ports, but taskOutput doesn't.

      It seems to me that the problem could be fixed if we make Path.makeQualified() to return the same path not matter the input path contains the default port or not.

      1. HADOOP-4717.patch
        1 kB
        Doug Cutting
      2. relativePath.patch
        4 kB
        Hairong Kuang
      3. relativePath1.patch
        4 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #680 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/680/ )
          Hide
          Hairong Kuang added a comment -

          I've just committed this.

          Show
          Hairong Kuang added a comment - I've just committed this.
          Hide
          Hairong Kuang added a comment -

          Ant test-core succeeded:
          BUILD SUCCESSFUL
          Total time: 113 minutes 54 seconds

          Ant test-patch succeeded:
          [exec] +1 overall.

          [exec] +1 @author. The patch does not contain any @author tags.

          [exec] +1 tests included. The patch appears to include 3 new or modified tests.

          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.

          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.

          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          Show
          Hairong Kuang added a comment - Ant test-core succeeded: BUILD SUCCESSFUL Total time: 113 minutes 54 seconds Ant test-patch succeeded: [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          Hide
          Doug Cutting added a comment -

          +1 This looks good to me.

          Show
          Doug Cutting added a comment - +1 This looks good to me.
          Hide
          Hairong Kuang added a comment -

          This new patch makes two changes in my newly added unit test:
          1. When dfs cluster fails to start because of server binding exception, log the error and skip the test;
          2. The map/reduce job has a output path that includes the default NameNode port#.

          Show
          Hairong Kuang added a comment - This new patch makes two changes in my newly added unit test: 1. When dfs cluster fails to start because of server binding exception, log the error and skip the test; 2. The map/reduce job has a output path that includes the default NameNode port#.
          Hide
          Hairong Kuang added a comment -

          In addition to Doug's change, this patch
          1. throws IOException if relativitize fails as Koji suggested;
          2. add a unit test to make sure a map/reduce job with output path containing no port works.

          Show
          Hairong Kuang added a comment - In addition to Doug's change, this patch 1. throws IOException if relativitize fails as Koji suggested; 2. add a unit test to make sure a map/reduce job with output path containing no port works.
          Hide
          Koji Noguchi added a comment -

          I understand that HADOOP-4717&HADOOP-4746 would fix the problem, but can we throw an Exception when

          575 URI relativePath = taskOutputPath.toUri().relativize(taskOutput.toUri());
          

          doesn't return a relativePath?
          If we hit a similar issue again, I would rather have the job fail
          than job returning 0 but silently deleting the output.

          Show
          Koji Noguchi added a comment - I understand that HADOOP-4717 & HADOOP-4746 would fix the problem, but can we throw an Exception when 575 URI relativePath = taskOutputPath.toUri().relativize(taskOutput.toUri()); doesn't return a relativePath? If we hit a similar issue again, I would rather have the job fail than job returning 0 but silently deleting the output.
          Hide
          Hairong Kuang added a comment -

          Yes, it works as long as we also fix HADOOP-4746. Doug, could you please include a junit test?

          Show
          Hairong Kuang added a comment - Yes, it works as long as we also fix HADOOP-4746 . Doug, could you please include a junit test?
          Hide
          Doug Cutting added a comment -

          Here's a patch that changes DistributedFileSystem#makeQualified() to remove the default port if it's specified. Does that fix things for you?

          Show
          Doug Cutting added a comment - Here's a patch that changes DistributedFileSystem#makeQualified() to remove the default port if it's specified. Does that fix things for you?
          Hide
          Hairong Kuang added a comment -

          I wrote a test. It showed that FileSystem#makeQualified() did not remove the default port# even if the input path contains the default port #.

          Show
          Hairong Kuang added a comment - I wrote a test. It showed that FileSystem#makeQualified() did not remove the default port# even if the input path contains the default port #.
          Hide
          Doug Cutting added a comment -

          It seems to me that the output directory should somewhere be normalized by calling FileSystem#makeQualified() on it, so that it's of the form "hdfs://localhost/out/".

          Show
          Doug Cutting added a comment - It seems to me that the output directory should somewhere be normalized by calling FileSystem#makeQualified() on it, so that it's of the form "hdfs://localhost/out/".

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development