Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5

Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: 0.20.2, 1.1.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Sun Java 1.6.0_13, OpenSolaris, running on a SunFire 4150 (x64) 10 node cluster

      Description

      During the shuffle phase, I'm seeing a large sequence of the following actions:

      1) WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed : org.mortbay.jetty.EofException
      2) WARN org.mortbay.log: Committed before 410 getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed : org.mortbay.jetty.EofException
      3) ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException: Committed

      The map phase completes with 100%, and then the reduce phase crawls along with the above errors in each of the TaskTracker logs. None of the tasktrackers get lost. When I run non-data jobs like the 'pi' test from the example jar, everything works fine.

      1. temp.rar
        897 kB
        xieguiming

        Issue Links

          Activity

          Hide
          George Porter added a comment -

          The relevant sections of the TaskTracker log are:

          2009-05-18 14:59:05,699 INFO org.apache.hadoop.mapred.JvmManager: No new JVM spawned for jobId/taskid: job_200905181452_0002/attempt_200905181452_0002_m_000110_0. Attempting to reuse: jvm_200905181452_0002_m_-1082179983
          2009-05-18 14:59:05,718 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_200905181452_0002_m_-1082179983 given task: attempt_200905181452_0002_m_000110_0
          2009-05-18 14:59:07,536 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed :
          org.mortbay.jetty.EofException
          at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
          at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:566)
          at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946)
          at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646)
          at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577)
          at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2979)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
          at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
          at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
          at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
          at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
          at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
          at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
          at org.mortbay.jetty.Server.handle(Server.java:324)
          at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
          at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
          at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
          at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
          at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
          at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
          Caused by: java.io.IOException: Broken pipe
          at sun.nio.ch.FileDispatcher.write0(Native Method)
          at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
          at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
          at sun.nio.ch.IOUtil.write(IOUtil.java:75)
          at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
          at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
          at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
          at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
          ... 23 more

          2009-05-18 14:59:07,537 WARN org.mortbay.log: Committed before 410 getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed :
          org.mortbay.jetty.EofException
          at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
          at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:566)
          at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946)
          at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646)
          at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577)
          at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2979)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
          at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
          at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
          at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
          at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
          at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
          at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
          at org.mortbay.jetty.Server.handle(Server.java:324)
          at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
          at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
          at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
          at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
          at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
          at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
          Caused by: java.io.IOException: Broken pipe
          at sun.nio.ch.FileDispatcher.write0(Native Method)
          at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
          at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
          at sun.nio.ch.IOUtil.write(IOUtil.java:75)
          at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
          at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
          at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
          at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
          ... 23 more

          2009-05-18 14:59:07,537 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 0.0.0.0:0, dest: 10.7.2.97:55398, bytes: 65536, op: MAPRED_SHUFFLE, cliID: attempt_200905181452_0002_m_000010_0
          2009-05-18 14:59:07,552 ERROR org.mortbay.log: /mapOutput
          java.lang.IllegalStateException: Committed
          at org.mortbay.jetty.Response.resetBuffer(Response.java:994)
          at org.mortbay.jetty.Response.sendError(Response.java:240)
          at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3002)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
          at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
          at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
          at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
          at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
          at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
          at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
          at org.mortbay.jetty.Server.handle(Server.java:324)
          at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
          at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
          at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
          at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
          at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
          at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
          2009-05-18 14:59:08,479 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200905181452_0002_m_000101_0 1.0% hdfs://rep4150-90h:9000/sortinput/part-00006:671088640+134217728
          2009-05-18 14:59:09,284 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200905181452_0002_m_000101_0 1.0% hdfs://rep4150-90h:9000/sortinput/part-00006:671088640+134217728
          2009-05-18 14:59:09,285 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_200905181452_0002_m_000101_0 is done.
          2009-05-18 14:59:09,285 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_200905181452_0002_m_000101_0 was 0
          2009-05-18 14:59:09,285 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 1

          Show
          George Porter added a comment - The relevant sections of the TaskTracker log are: 2009-05-18 14:59:05,699 INFO org.apache.hadoop.mapred.JvmManager: No new JVM spawned for jobId/taskid: job_200905181452_0002/attempt_200905181452_0002_m_000110_0. Attempting to reuse: jvm_200905181452_0002_m_-1082179983 2009-05-18 14:59:05,718 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_200905181452_0002_m_-1082179983 given task: attempt_200905181452_0002_m_000110_0 2009-05-18 14:59:07,536 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed : org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:566) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2979) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169) at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221) at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721) ... 23 more 2009-05-18 14:59:07,537 WARN org.mortbay.log: Committed before 410 getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed : org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:566) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2979) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169) at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221) at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721) ... 23 more 2009-05-18 14:59:07,537 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 0.0.0.0:0, dest: 10.7.2.97:55398, bytes: 65536, op: MAPRED_SHUFFLE, cliID: attempt_200905181452_0002_m_000010_0 2009-05-18 14:59:07,552 ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException: Committed at org.mortbay.jetty.Response.resetBuffer(Response.java:994) at org.mortbay.jetty.Response.sendError(Response.java:240) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3002) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) 2009-05-18 14:59:08,479 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200905181452_0002_m_000101_0 1.0% hdfs://rep4150-90h:9000/sortinput/part-00006:671088640+134217728 2009-05-18 14:59:09,284 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200905181452_0002_m_000101_0 1.0% hdfs://rep4150-90h:9000/sortinput/part-00006:671088640+134217728 2009-05-18 14:59:09,285 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_200905181452_0002_m_000101_0 is done. 2009-05-18 14:59:09,285 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_200905181452_0002_m_000101_0 was 0 2009-05-18 14:59:09,285 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 1
          Hide
          George Porter added a comment -

          There could be a connection to HADOOP-861.

          Show
          George Porter added a comment - There could be a connection to HADOOP-861 .
          Hide
          Jothi Padmanabhan added a comment -

          The Jetty EOFException is observed when the Reduce Task prematurely closes a connection to a jetty server. The RT might decide to do this when the expected map output size is less than its memory cache limit, but cannot fetch it now because it has already fetched several other map outputs to memory. So, it triggers a flush of the existing map outputs from memory to disk and once this flush is complete, it will attempt to fetch the map output again by reopening the connection to the Jetty. In short, it is OK to see such messages.

          This should neither cause a task nor a job failure. Are you seeing failures? The reason why these error messages appear for some jobs and not for others is primarily got to do with the size of map outputs and the heap size allocated for the reducer.

          Show
          Jothi Padmanabhan added a comment - The Jetty EOFException is observed when the Reduce Task prematurely closes a connection to a jetty server. The RT might decide to do this when the expected map output size is less than its memory cache limit, but cannot fetch it now because it has already fetched several other map outputs to memory. So, it triggers a flush of the existing map outputs from memory to disk and once this flush is complete, it will attempt to fetch the map output again by reopening the connection to the Jetty. In short, it is OK to see such messages. This should neither cause a task nor a job failure. Are you seeing failures? The reason why these error messages appear for some jobs and not for others is primarily got to do with the size of map outputs and the heap size allocated for the reducer.
          Hide
          Olivier Grisel added a comment -

          I can reproduce this on a job and I confirm that the reduce tasks fails 3 consecutive times (at 83% of completion) and makes the job fail. The memory on my node was pretty full so this it probably related to reducer heap size configuration.

          It would be nice if the user could be provided with a more meaningful error message telling her to give more heap space to the reducer.

          Show
          Olivier Grisel added a comment - I can reproduce this on a job and I confirm that the reduce tasks fails 3 consecutive times (at 83% of completion) and makes the job fail. The memory on my node was pretty full so this it probably related to reducer heap size configuration. It would be nice if the user could be provided with a more meaningful error message telling her to give more heap space to the reducer.
          Hide
          Allen Wittenauer added a comment -

          If these messages are harmful, then we need to give a better message.

          If these messages aren't harmful, then we shouldn't be throwing a java exception.

          FWIW, we see these all the time at LinkedIn.

          Show
          Allen Wittenauer added a comment - If these messages are harmful, then we need to give a better message. If these messages aren't harmful, then we shouldn't be throwing a java exception. FWIW, we see these all the time at LinkedIn.
          Hide
          Ray Young added a comment -

          Is this issue solved? Since the filling of the memory is the always case in Sort like jobs, is it preferable not to close the input stream and maintain it, also to avoid the unnecessary reopen? This will not lead to job failure as reopen and re-transferring will occur, but the hosts will be regarded as "slow hosts" and put into the penaltyBox, thus slowing down the whole progress.

          Show
          Ray Young added a comment - Is this issue solved? Since the filling of the memory is the always case in Sort like jobs, is it preferable not to close the input stream and maintain it, also to avoid the unnecessary reopen? This will not lead to job failure as reopen and re-transferring will occur, but the hosts will be regarded as "slow hosts" and put into the penaltyBox, thus slowing down the whole progress.
          Hide
          Patrick Angeles added a comment -

          > If these messages are harmful, then we need to give a better message.
          > If these messages aren't harmful, then we shouldn't be throwing a java exception.

          +1

          In particular, logging the entire stacktrace blows up the log file. If this is normal behavior, then perhaps a warning or info log could be emitted with no stacktrace.

          Show
          Patrick Angeles added a comment - > If these messages are harmful, then we need to give a better message. > If these messages aren't harmful, then we shouldn't be throwing a java exception. +1 In particular, logging the entire stacktrace blows up the log file. If this is normal behavior, then perhaps a warning or info log could be emitted with no stacktrace.
          Hide
          Patricio Echague added a comment -

          > If these messages are harmful, then we need to give a better message.
          > If these messages aren't harmful, then we shouldn't be throwing a java exception.

          +1

          Show
          Patricio Echague added a comment - > If these messages are harmful, then we need to give a better message. > If these messages aren't harmful, then we shouldn't be throwing a java exception. +1
          Hide
          divya added a comment -

          We get the above exception on a cluster set up and not on single node . Due to this the performance of the map/reduce job goes down (from 1mins, 35sec to 4mins, 24sec ).
          But one change in the exception the cause .

          org.mortbay.jetty.EofException

          Caused by: java.io.IOException: Connection reset by peer
          at sun.nio.ch.FileDispatcher.write0(Native Method)

          Does it happen due to connectivity issue among the nodes ?

          Show
          divya added a comment - We get the above exception on a cluster set up and not on single node . Due to this the performance of the map/reduce job goes down (from 1mins, 35sec to 4mins, 24sec ). But one change in the exception the cause . org.mortbay.jetty.EofException Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) Does it happen due to connectivity issue among the nodes ?
          Hide
          Han Qiu added a comment -

          We are getting the same exception in shuffle and it causes all reducers to failed, attached please find the exception.

          2012-04-24 15:03:52,163 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.112.195.128:50060, dest: 10.242.102.100:35678, bytes: 65536, op: MAPRED_SHUFFLE, cliID: attempt_201204202043_0002_m_000328_0, duration: 11772847
          2012-04-24 15:03:52,163 ERROR org.mortbay.log: /mapOutput
          java.lang.IllegalStateException: Committed
          at org.mortbay.jetty.Response.resetBuffer(Response.java:1023)
          at org.mortbay.jetty.Response.sendError(Response.java:240)
          at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3718)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
          at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:824)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
          at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
          at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
          at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
          at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
          at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
          at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
          at org.mortbay.jetty.Server.handle(Server.java:326)
          at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
          at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
          at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
          at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
          at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
          at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

          Show
          Han Qiu added a comment - We are getting the same exception in shuffle and it causes all reducers to failed, attached please find the exception. 2012-04-24 15:03:52,163 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.112.195.128:50060, dest: 10.242.102.100:35678, bytes: 65536, op: MAPRED_SHUFFLE, cliID: attempt_201204202043_0002_m_000328_0, duration: 11772847 2012-04-24 15:03:52,163 ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException: Committed at org.mortbay.jetty.Response.resetBuffer(Response.java:1023) at org.mortbay.jetty.Response.sendError(Response.java:240) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3718) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:824) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
          Hide
          xieguiming added a comment -

          I also met the same exception, and so many. and at last the TT port(50060), 83 connections are in CLOSE_WAIT state. and can not accept any connect.

          2012-05-17 15:52:42,513 INFO TaskTracker.clienttrace (TaskTracker.java:doGet(3211)) - src: 172.16.4.7:10060, dest: 172.16.4.8:47138, bytes: 458752, op: MAPRED_SHUFFLE, cliID: attempt_201205061050_13074_m_000179_0
          2012-05-17 15:52:42,514 ERROR mortbay.log (Slf4jLog.java:warn(87)) - /mapOutput
          java.lang.IllegalStateException: Committed
          at org.mortbay.jetty.Response.resetBuffer(Response.java:1023)
          at org.mortbay.jetty.Response.sendError(Response.java:240)
          at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3202)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
          at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
          at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
          at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
          at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
          at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
          at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
          at org.mortbay.jetty.Server.handle(Server.java:326)
          at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
          at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
          at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
          at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
          at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
          at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
          2012-05-17 15:52:42,510 WARN mapred.TaskTracker (TaskTracker.java:doGet(3198)) - getMapOutput(attempt_201205061050_13076_m_000055_0,4) failed :
          org.mortbay.jetty.EofException: timeout
          at org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:548)
          at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:572)
          at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)
          at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:651)
          at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
          at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3179)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
          at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
          at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
          at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
          at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
          at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
          at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
          at org.mortbay.jetty.Server.handle(Server.java:326)
          at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
          at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
          at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
          at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
          at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
          at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

          Show
          xieguiming added a comment - I also met the same exception, and so many. and at last the TT port(50060), 83 connections are in CLOSE_WAIT state. and can not accept any connect. 2012-05-17 15:52:42,513 INFO TaskTracker.clienttrace (TaskTracker.java:doGet(3211)) - src: 172.16.4.7:10060, dest: 172.16.4.8:47138, bytes: 458752, op: MAPRED_SHUFFLE, cliID: attempt_201205061050_13074_m_000179_0 2012-05-17 15:52:42,514 ERROR mortbay.log (Slf4jLog.java:warn(87)) - /mapOutput java.lang.IllegalStateException: Committed at org.mortbay.jetty.Response.resetBuffer(Response.java:1023) at org.mortbay.jetty.Response.sendError(Response.java:240) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3202) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) 2012-05-17 15:52:42,510 WARN mapred.TaskTracker (TaskTracker.java:doGet(3198)) - getMapOutput(attempt_201205061050_13076_m_000055_0,4) failed : org.mortbay.jetty.EofException: timeout at org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:548) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:572) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:651) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3179) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
          Hide
          xieguiming added a comment -

          I have analyzed this problem for one whole day, and I will show some details more.
          1>The TT throw the EofException and the IllegalStateExcetion for the getMapOutput.

          2>and then,I use the netstat command to check the http port (50060), and find 83 connections are on CLOSE_WAIT state.and the CLOSE_WAIT state do not disapper always. At least, for 24 hours.

          3>form the TT log, after print the exception, the TT http server do not work well. can not accept any http request(no "sent out" log found later). and JT add it to the blacklist. I use the curl shell command to access the http service, and client throw timeout. and the Datanode http service on the same node is ok.

          4>and I also find the TT CPU is 100% even when there is no any childjvm.

          5>and I also find the reduce task on the same node copy slower from other node .

          6>I restart the TT. and the TT works well.

          I attach the TT logs. if need other logs, tell me. but I am sorry that we have not the matched userlog, because the userlog will be delete after only 3 hours. and when we find the problem, and many hours pass.

          Show
          xieguiming added a comment - I have analyzed this problem for one whole day, and I will show some details more. 1>The TT throw the EofException and the IllegalStateExcetion for the getMapOutput. 2>and then,I use the netstat command to check the http port (50060), and find 83 connections are on CLOSE_WAIT state.and the CLOSE_WAIT state do not disapper always. At least, for 24 hours. 3>form the TT log, after print the exception, the TT http server do not work well. can not accept any http request(no "sent out" log found later). and JT add it to the blacklist. I use the curl shell command to access the http service, and client throw timeout. and the Datanode http service on the same node is ok. 4>and I also find the TT CPU is 100% even when there is no any childjvm. 5>and I also find the reduce task on the same node copy slower from other node . 6>I restart the TT. and the TT works well. I attach the TT logs. if need other logs, tell me. but I am sorry that we have not the matched userlog, because the userlog will be delete after only 3 hours. and when we find the problem, and many hours pass.
          Hide
          Joshua Caplan added a comment -

          Could this be interacting negatively with MAPREDUCE-3851? If the exceptions here really are innocuous, we might want to supply some defaults for mapreduce.reduce.shuffle.catch.exception.stack.regex and mapreduce.reduce.shuffle.catch.exception.message.regex which manage to exclude these from consideration when calculating if the shuffle exception threshold has been reached.

          Show
          Joshua Caplan added a comment - Could this be interacting negatively with MAPREDUCE-3851 ? If the exceptions here really are innocuous, we might want to supply some defaults for mapreduce.reduce.shuffle.catch.exception.stack.regex and mapreduce.reduce.shuffle.catch.exception.message.regex which manage to exclude these from consideration when calculating if the shuffle exception threshold has been reached.
          Hide
          David Parks added a comment -

          I'm encountering this same situation on AWS's mapreduce instance using v1.0.3.

          Show
          David Parks added a comment - I'm encountering this same situation on AWS's mapreduce instance using v1.0.3.
          Hide
          Steve Watt added a comment -

          For folks looking for a workaround - I am seeing these same exceptions in my logs. I am running a 1TB TeraSort with 20 Reducers on a 4 node cluster. 5 Reducer Tasks hung in the "Reduce > Sort" phase at 33% progress for a very long time after all the other reducer tasks had long since completed (hours ago). All of these Reducer Tasks are on the same node, which is very low on free memory (200MB available). The amount of memory allocated to each slot JVM is 1024MB. Net net, I have over provisioned the amount of slots (as well as the JVM memory allocated to each slot) on this node. To fix this, I greatly reduced the slot allocations on this node.

          Show
          Steve Watt added a comment - For folks looking for a workaround - I am seeing these same exceptions in my logs. I am running a 1TB TeraSort with 20 Reducers on a 4 node cluster. 5 Reducer Tasks hung in the "Reduce > Sort" phase at 33% progress for a very long time after all the other reducer tasks had long since completed (hours ago). All of these Reducer Tasks are on the same node, which is very low on free memory (200MB available). The amount of memory allocated to each slot JVM is 1024MB. Net net, I have over provisioned the amount of slots (as well as the JVM memory allocated to each slot) on this node. To fix this, I greatly reduced the slot allocations on this node.
          Hide
          Tan Cheng Wee added a comment -

          Will cut down the number of parallel copies help to address this issue?

          Show
          Tan Cheng Wee added a comment - Will cut down the number of parallel copies help to address this issue?
          Hide
          Harsh J added a comment - - edited

          This one's been stale for a long time, and given the below comment from Jothi, am marking it as Not A Problem.

          The Jetty EOFException is observed when the Reduce Task prematurely closes a connection to a jetty server. The RT might decide to do this when the expected map output size is less than its memory cache limit, but cannot fetch it now because it has already fetched several other map outputs to memory. So, it triggers a flush of the existing map outputs from memory to disk and once this flush is complete, it will attempt to fetch the map output again by reopening the connection to the Jetty. In short, it is OK to see such messages.
          This should neither cause a task nor a job failure. Are you seeing failures? The reason why these error messages appear for some jobs and not for others is primarily got to do with the size of map outputs and the heap size allocated for the reducer.

          For suppressing the log, as discussed earlier, I've filed: MAPREDUCE-5492.

          Show
          Harsh J added a comment - - edited This one's been stale for a long time, and given the below comment from Jothi, am marking it as Not A Problem. The Jetty EOFException is observed when the Reduce Task prematurely closes a connection to a jetty server. The RT might decide to do this when the expected map output size is less than its memory cache limit, but cannot fetch it now because it has already fetched several other map outputs to memory. So, it triggers a flush of the existing map outputs from memory to disk and once this flush is complete, it will attempt to fetch the map output again by reopening the connection to the Jetty. In short, it is OK to see such messages. This should neither cause a task nor a job failure. Are you seeing failures? The reason why these error messages appear for some jobs and not for others is primarily got to do with the size of map outputs and the heap size allocated for the reducer. For suppressing the log, as discussed earlier, I've filed: MAPREDUCE-5492 .

            People

            • Assignee:
              Unassigned
              Reporter:
              George Porter
            • Votes:
              12 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development