Hadoop Common
  1. Hadoop Common
  2. HADOOP-3760

DFS operations fail because of Stream closed error

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.1, 0.18.0
    • Fix Version/s: 0.17.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Fix a bug with HDFS file close() mistakenly introduced by HADOOP-3681.
      Show
      Fix a bug with HDFS file close() mistakenly introduced by HADOOP-3681 .

      Description

      DFS operations fail because of java.io.IOException: Stream closed..

      1. HADOOP-3760-trunk.patch
        0.9 kB
        Lohit Vijayarenu
      2. HADOOP-3760-trunk.patch
        1 kB
        Lohit Vijayarenu
      3. HADOOP-3760-18.patch
        0.9 kB
        Lohit Vijayarenu
      4. HADOOP-3760-18.patch
        1 kB
        Lohit Vijayarenu
      5. HADOOP-3760-17.patch
        0.9 kB
        Lohit Vijayarenu
      6. HADOOP-3760-17.patch
        1 kB
        Lohit Vijayarenu

        Activity

        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )
        Hide
        Raghu Angadi added a comment -

        I just committed this. Thanks Lohit!

        Show
        Raghu Angadi added a comment - I just committed this. Thanks Lohit!
        Hide
        Lohit Vijayarenu added a comment -

        Thanks Raghu. Updated patch with comments.

        Show
        Lohit Vijayarenu added a comment - Thanks Raghu. Updated patch with comments.
        Hide
        Raghu Angadi added a comment -

        +1. Looks good.

        It will be better to comment why isClosed() is callled and closed is set to true right after after flushInternal(). It will save some head-scratching multiple times in future when we look at it.

        Show
        Raghu Angadi added a comment - +1. Looks good. It will be better to comment why isClosed() is callled and closed is set to true right after after flushInternal() . It will save some head-scratching multiple times in future when we look at it.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 patch looks good

        Show
        Tsz Wo Nicholas Sze added a comment - +1 patch looks good
        Hide
        Lohit Vijayarenu added a comment -

        Here is test-patch output. I ran tests on both 0.18 and trunk. All tests pass.
        [exec]
        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
        [exec] Please justify why no tests are needed for this patch.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]

        Show
        Lohit Vijayarenu added a comment - Here is test-patch output. I ran tests on both 0.18 and trunk. All tests pass. [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec]
        Hide
        Lohit Vijayarenu added a comment -

        I was able to reproduce this by commenting out call to file complete, a request to namenode. The changes in HADOOP-3681 was waiting for a while, retrying 10 times for file to complete and then check for isClose(). By that time closed is already set to true so, it would throw a stream closed exception. isClosed() call after flushInternal() should do the job needed for HADOOP-3681. I tested both the cases and it seems to fix it.

        Show
        Lohit Vijayarenu added a comment - I was able to reproduce this by commenting out call to file complete, a request to namenode. The changes in HADOOP-3681 was waiting for a while, retrying 10 times for file to complete and then check for isClose(). By that time closed is already set to true so, it would throw a stream closed exception. isClosed() call after flushInternal() should do the job needed for HADOOP-3681 . I tested both the cases and it seems to fix it.
        Hide
        Lohit Vijayarenu added a comment -

        I see the comment that job submission fails, which isnt acceptable. Promoting this for 0.18 until we know what is the cause.

        Show
        Lohit Vijayarenu added a comment - I see the comment that job submission fails, which isnt acceptable. Promoting this for 0.18 until we know what is the cause.
        Hide
        Lohit Vijayarenu added a comment -

        Recently DFSClient was changed for a bug related to infinite loop while writing in HADOOP-3681. This could be related to that. The second exception is when the reduce fails instead of hanging for ever as seen before the bug fix. Trying to reproduce the first case.

        Show
        Lohit Vijayarenu added a comment - Recently DFSClient was changed for a bug related to infinite loop while writing in HADOOP-3681 . This could be related to that. The second exception is when the reduce fails instead of hanging for ever as seen before the bug fix. Trying to reproduce the first case.
        Hide
        Amar Kamat added a comment -

        Here is the complete error

        08/07/15 10:21:04 INFO mapred.FileInputFormat: Total input paths to process : 100
        java.io.IOException: Stream closed.
               at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.isClosed(DFSClient.java:2240)
               at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2744)
               at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:2652)
               at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
               at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
               at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:726)
               at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:966)
               at org.apache.hadoop.examples.Sort.run(Sort.java:147)
               at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
               at org.apache.hadoop.examples.Sort.main(Sort.java:158)
               at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
               at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
               at java.lang.reflect.Method.invoke(Method.java:597)
               at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
               at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
               at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:59)
               at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
               at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
               at java.lang.reflect.Method.invoke(Method.java:597)
               at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        Exception closing file system.dir/job/job.xml
        java.io.IOException: Stream closed.
               at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.isClosed(DFSClient.java:2240)
               at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2687)
               at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:2652)
               at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:220)
               at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:236)
               at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1379)
               at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:230)
               at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:215)
        
        

        The way this can be reproduced is
        1) Run random-writer
        2) Run sort ... here the job submission fails with the above mentioned error.

        Also reduce tasks fail with the error

        2008-07-15 09:31:28,631 INFO org.apache.hadoop.hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: 
        org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not replicated yet:tmp-dir/part-00000
        	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1115)
        	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:340)
        	at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        	at java.lang.reflect.Method.invoke(Method.java:597)
        	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
        	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
        
        	at org.apache.hadoop.ipc.Client.call(Client.java:707)
        	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        	at $Proxy1.addBlock(Unknown Source)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        	at java.lang.reflect.Method.invoke(Method.java:597)
        	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        	at $Proxy1.addBlock(Unknown Source)
        	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2445)
        	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2328)
        	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1740)
        	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1917)
        
        
        Show
        Amar Kamat added a comment - Here is the complete error 08/07/15 10:21:04 INFO mapred.FileInputFormat: Total input paths to process : 100 java.io.IOException: Stream closed. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.isClosed(DFSClient.java:2240) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2744) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:2652) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:726) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:966) at org.apache.hadoop.examples.Sort.run(Sort.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Sort.main(Sort.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) Exception closing file system.dir/job/job.xml java.io.IOException: Stream closed. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.isClosed(DFSClient.java:2240) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2687) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:2652) at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:220) at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:236) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1379) at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:230) at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:215) The way this can be reproduced is 1) Run random-writer 2) Run sort ... here the job submission fails with the above mentioned error. Also reduce tasks fail with the error 2008-07-15 09:31:28,631 INFO org.apache.hadoop.hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not replicated yet:tmp-dir/part-00000 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1115) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:340) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) at org.apache.hadoop.ipc.Client.call(Client.java:707) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2445) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2328) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1740) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1917)

          People

          • Assignee:
            Lohit Vijayarenu
            Reporter:
            Amar Kamat
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development