Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15219

DFS Client will stuck when ResponseProcessor.run throw Error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.3
    • 3.3.0, 3.1.4, 3.2.2
    • hdfs-client
    • None
    • Reviewed

    Description

      In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. 

      Then Exception like this:

      2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100000
      2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 1000000
      2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 1000000
      2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now...
      java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
       at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
       at java.lang.String.valueOf(String.java:2847)
       at java.lang.StringBuilder.append(StringBuilder.java:128)
       at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
      Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
       at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
       at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
       at java.security.AccessController.doPrivileged(Native Method)
       at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
       ... 4 more
      Caused by: java.util.zip.ZipException: error reading zip file
       at java.util.zip.ZipFile.read(Native Method)
       at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
       at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
       at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
       at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
       at sun.misc.Resource.getBytes(Resource.java:124)
       at java.net.URLClassLoader.defineClass(URLClassLoader.java:444)
       at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
       at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
       ... 10 more
      2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |util.ExitUtil|: Exiting with status -1
      2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Received should die response from AM
      2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked to die via task heartbeat
      2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an invocation of shutdownRequested
      
      

      Reason is UncaughtException. When time is 01:29, a disk was error, so throw NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then DataStream didn't know ReponseProcessor was dead, and can't trigger closeResponder, so stucked in DataStream.run.

       I tested in unit-test TestDataStream.testDfsClient. When I throw NoClassDefFoundError in ResponseProcessor.run, the TestDataStream.testDfsClient will failed bacause of timeout.

      I think we should catch Throwable but not Exception in ReponseProcessor.run.

       

      Attachments

        1. HDFS-15219.001.patch
          0.7 kB
          Chenyu Zheng

        Issue Links

          Activity

            People

              zhengchenyu Chenyu Zheng
              zhengchenyu Chenyu Zheng
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 672h
                  672h
                  Remaining:
                  Remaining Estimate - 672h
                  672h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified