Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4164

Hadoop 22 Exception thrown after task completion causes its reexecution

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.1
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Target Version/s:

      Description

      2012-02-28 19:17:08,504 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 1969310 bytes
      2012-02-28 19:17:08,694 INFO org.apache.hadoop.mapred.Task: Task:attempt_201202272306_0794_m_000094_0 is done. And is in the process of commiting
      2012-02-28 19:18:08,774 INFO org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: Call to /127.0.0.1:35400 failed on local exception: java.nio.channels.ClosedByInterruptException
      at org.apache.hadoop.ipc.Client.wrapException(Client.java:1094)
      at org.apache.hadoop.ipc.Client.call(Client.java:1062)
      at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
      at $Proxy0.statusUpdate(Unknown Source)
      at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:650)
      at java.lang.Thread.run(Thread.java:662)
      Caused by: java.nio.channels.ClosedByInterruptException
      at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
      at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
      at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:60)
      at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
      at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:151)
      at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:112)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
      at java.io.DataOutputStream.flush(DataOutputStream.java:106)
      at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:769)
      at org.apache.hadoop.ipc.Client.call(Client.java:1040)
      ... 4 more

      2012-02-28 19:18:08,825 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201202272306_0794_m_000094_0' done.

      ================>>>>>> SHOULD be <++++++++++++++
      2012-02-28 19:17:02,214 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 1974104 bytes
      2012-02-28 19:17:02,408 INFO org.apache.hadoop.mapred.Task: Task:attempt_201202272306_0794_m_000000_0 is done. And is in the process of commiting
      2012-02-28 19:17:02,519 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201202272306_0794_m_000000_0' done.

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-22-branch #101 (See https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/101/)
        MAPREDUCE-4164. Fix Communication exception thrown after task completion. Contributed by Mayank Bansal. (Revision 1329486)

        Result = SUCCESS
        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1329486
        Files :

        • /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt
        • /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapred/Task.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-22-branch #101 (See https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/101/ ) MAPREDUCE-4164 . Fix Communication exception thrown after task completion. Contributed by Mayank Bansal. (Revision 1329486) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1329486 Files : /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapred/Task.java
        Hide
        Konstantin Shvachko added a comment -

        I just committed this. Thank you Mayank.

        Show
        Konstantin Shvachko added a comment - I just committed this. Thank you Mayank.
        Hide
        Konstantin Shvachko added a comment -

        +1
        This was affecting our benchmarks, because this was causing task re-executed.
        Now benchmarks look really good.

        Show
        Konstantin Shvachko added a comment - +1 This was affecting our benchmarks, because this was causing task re-executed. Now benchmarks look really good.
        Hide
        Mayank Bansal added a comment -

        1. TaskReporter thread sends status updates/pings periodically to TaskTracker. If it needs to send the task progress, it sends STATUS_UPDATE message
        to TaskTracker. Otherwise, it sends a PING signal to check if the TaskTracker is alive.

        2. When the map/reduce phase is over, it calls stopCommunicationThread() which interrupts ping/statusupdate thread.

        3. If the system was trying to communicate with the server at the time of interrupts, it breaks the connection to the
        server.Since the interrupt was issued, the stream throws ClosedByInterruptException.

        5. However in Client.java, Client keeps waiting for the response and it basically times out and re-throws this exception.

        Show
        Mayank Bansal added a comment - 1. TaskReporter thread sends status updates/pings periodically to TaskTracker. If it needs to send the task progress, it sends STATUS_UPDATE message to TaskTracker. Otherwise, it sends a PING signal to check if the TaskTracker is alive. 2. When the map/reduce phase is over, it calls stopCommunicationThread() which interrupts ping/statusupdate thread. 3. If the system was trying to communicate with the server at the time of interrupts, it breaks the connection to the server.Since the interrupt was issued, the stream throws ClosedByInterruptException. 5. However in Client.java, Client keeps waiting for the response and it basically times out and re-throws this exception.

          People

          • Assignee:
            Mayank Bansal
            Reporter:
            Mayank Bansal
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development