Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0.0, 1.0.3, 2.4.0, 3.0.0
    • Fix Version/s: 2.5.0
    • Component/s: fs/s3
    • Labels:
      None
    • Environment:

      Hadoop with default configurations

    • Tags:
      mapreduce, s3, mr, hadoop

      Description

      I'm running a wordcount MR as follows

      hadoop jar WordCount.jar wordcount.WordCountDriver s3n://bucket/wordcount/input s3n://bucket/wordcount/output

      s3n://bucket/wordcount/input is a s3 object that contains other input files.

      However I get following NPE error

      12/10/02 18:56:23 INFO mapred.JobClient: map 0% reduce 0%
      12/10/02 18:56:54 INFO mapred.JobClient: map 50% reduce 0%
      12/10/02 18:56:56 INFO mapred.JobClient: Task Id : attempt_201210021853_0001_m_000001_0, Status : FAILED
      java.lang.NullPointerException
      at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
      at java.io.BufferedInputStream.close(BufferedInputStream.java:451)
      at java.io.FilterInputStream.close(FilterInputStream.java:155)
      at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
      at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.close(LineRecordReader.java:144)
      at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:497)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
      at org.apache.hadoop.mapred.Child.main(Child.java:249)

      MR runs fine if i specify more specific input path such as s3n://bucket/wordcount/input/file.txt

      MR fails if I pass s3 folder as a parameter

      In summary,
      This works
      hadoop jar ./hadoop-examples-1.0.3.jar wordcount /user/hadoop/wordcount/input/ s3n://bucket/wordcount/output/

      This doesn't work
      hadoop jar ./hadoop-examples-1.0.3.jar wordcount s3n://bucket/wordcount/input/ s3n://bucket/wordcount/output/

      (both input path are directories)

        Issue Links

          Activity

          Hide
          Steve Loughran added a comment -

          marking as relates-to HADOOP-10589, though they are dissimilar in root cause -the checking for null input stream at close is different

          Show
          Steve Loughran added a comment - marking as relates-to HADOOP-10589 , though they are dissimilar in root cause -the checking for null input stream at close is different
          Hide
          Steve Loughran added a comment -

          looking at this code, the only way the inner in stream can be null is if the store.retrieve(key) operation returned null. It does this if any exception is raised when trying to retrieve the data that is not an instance of or subclass of IOException, or the http response returned null

          Either

          1. Something is going wrong with the GET, and that exception is being logged at debug, then discarded.
          2. the HTTP request is returning a null response and this is not being picked up.

          having the constructor and seek() operations check for null values and raise an IOE is the solution here; making close more robust wise, but does not address the real problem

          Show
          Steve Loughran added a comment - looking at this code, the only way the inner in stream can be null is if the store.retrieve(key) operation returned null. It does this if any exception is raised when trying to retrieve the data that is not an instance of or subclass of IOException , or the http response returned null Either Something is going wrong with the GET, and that exception is being logged at debug, then discarded. the HTTP request is returning a null response and this is not being picked up. having the constructor and seek() operations check for null values and raise an IOE is the solution here; making close more robust wise, but does not address the real problem
          Hide
          Steve Loughran added a comment -

          can we leave this open, but I'll move to hadoop/ fs/s3, as the check for null input on close is still thre.

          Show
          Steve Loughran added a comment - can we leave this open, but I'll move to hadoop/ fs/s3, as the check for null input on close is still thre.
          Hide
          Chen He added a comment -

          Or close it if it is not a problem for 1.x either.

          Show
          Chen He added a comment - Or close it if it is not a problem for 1.x either.
          Hide
          Chen He added a comment -

          Hi Benjamin Kim
          Thank you for the reply. Since it is not a problem for 2.x, would you mind remove 2.x from the target version?

          Show
          Chen He added a comment - Hi Benjamin Kim Thank you for the reply. Since it is not a problem for 2.x, would you mind remove 2.x from the target version?
          Hide
          Benjamin Kim added a comment -

          Hi Chen
          I tested it with CDH4.5.0(hadoop-2.0.0+1518) and doesn't seem to have same problem. I'am able to successfully run a wordcount MRv1 job with s3n protocol.
          So is it pretty safe to say this issue is fixed on other 2.x.x versions?

          Show
          Benjamin Kim added a comment - Hi Chen I tested it with CDH4.5.0(hadoop-2.0.0+1518) and doesn't seem to have same problem. I'am able to successfully run a wordcount MRv1 job with s3n protocol. So is it pretty safe to say this issue is fixed on other 2.x.x versions?
          Hide
          Chen He added a comment -

          Hi Benjamin Kim
          This JIRA has no updates since 11/Oct/12. Is it still a problem? Right now, it is time to clean up 0.23 JIRAs. If it is still a problem in 2.x. Please retarget it to 2.x. Thanks!

          Show
          Chen He added a comment - Hi Benjamin Kim This JIRA has no updates since 11/Oct/12. Is it still a problem? Right now, it is time to clean up 0.23 JIRAs. If it is still a problem in 2.x. Please retarget it to 2.x. Thanks!
          Hide
          Steve Loughran added a comment -

          Looks like this is triggered on a line that is in.close(), if the input stream is null.

          This shouldn't happen at construction time -because the opening code should have failed, but it does appear possible
          in the seek() command, which first closes the existing stream, then calls store.retrieve(key, pos) -an operation that can return null if S3 doesn't have that key.

          At the very least, close() should be made robust against a null inner input stream; maybe the seek operation should convert a null retrieve operation into an IOException.

          Show
          Steve Loughran added a comment - Looks like this is triggered on a line that is in.close() , if the input stream is null. This shouldn't happen at construction time -because the opening code should have failed, but it does appear possible in the seek() command, which first closes the existing stream, then calls store.retrieve(key, pos) -an operation that can return null if S3 doesn't have that key. At the very least, close() should be made robust against a null inner input stream; maybe the seek operation should convert a null retrieve operation into an IOException .

            People

            • Assignee:
              Steve Loughran
              Reporter:
              Benjamin Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development