Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-966

Tez AM has invalid state transition error when datanode is bad.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.4.0
    • 0.4.0
    • None
    • None

    Description

      I found AM has an invalid event error when AM complains datanode is bad.

      java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.18.145.215:35766 remote=/172.18.145.215:50010]
      	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
      	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
      	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
      	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
      	at java.io.FilterInputStream.read(FilterInputStream.java:83)
      	at java.io.FilterInputStream.read(FilterInputStream.java:83)
      	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1985)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:796)
      2014-03-20 08:27:09,529 WARN [AsyncDispatcher event handler] org.apache.hadoop.hdfs.DFSClient: Error while syncing
      java.io.IOException: All datanodes 172.18.145.215:50010 are bad. Aborting...
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
      2014-03-20 08:27:09,530 WARN [AsyncDispatcher event handler] org.apache.tez.dag.history.recovery.RecoveryService: Error handling summary event, eventType=VERTEX_FINISHED
      java.io.IOException: All datanodes 172.18.145.215:50010 are bad. Aborting...
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
      2014-03-20 08:27:09,530 ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Failed to send vertex finished event to recovery
      java.io.IOException: All datanodes 172.18.145.215:50010 are bad. Aborting...
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
      2014-03-20 08:27:09,531 ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event V_TASK_COMPLETED on vertex initialmap with vertexId vertex_1395294589125_0141_1_00 at current state RUNNING
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_TASK_COMPLETED at RUNNING
      	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
      	at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1202)
      	at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:155)
      	at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1549)
      	at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1535)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
      	at java.lang.Thread.run(Thread.java:722)
      

      Attachments

        1. TEZ-966.1.patch
          0.9 kB
          Hitesh Shah

        Activity

          People

            hitesh Hitesh Shah
            tassapola Tassapol Athiapinya
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: