Hadoop Common
  1. Hadoop Common
  2. HADOOP-1179

task Tracker should be restarted if its jetty http server cannot serve get-map-output files

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.3
    • Component/s: None
    • Labels:
      None

      Description

      Due to some errors (mem leak?), the jetty http server throws outOfMemory exception when serving get-map-output requests:

      2007-03-28 20:42:39,608 WARN org.mortbay.jetty.servlet.ServletHandler: Error for
      /mapOutput?map=task_0334_m_013127_0&reduce=591
      2007-03-28 20:46:42,788 WARN org.mortbay.jetty.servlet.ServletHandler: Error for
      /mapOutput?map=task_0334_m_013127_0&reduce=591
      2007-03-28 20:49:38,064 WARN org.mortbay.jetty.servlet.ServletHandler: Error for
      java.lang.OutOfMemoryError
      at java.io.FileInputStream.readBytes(Native Method)
      at java.io.FileInputStream.read(FileInputStream.java:199)
      at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(R
      awLocalFileSystem.java:119)
      at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInput
      Stream.java:41)
      at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
      at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
      at java.io.DataInputStream.read(DataInputStream.java:132)
      at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(Che
      cksumFileSystem.java:182)
      at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumF
      ileSystem.java:167)
      at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInput
      Stream.java:41)
      at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
      at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
      at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
      at java.io.DataInputStream.readFully(DataInputStream.java:178)
      at java.io.DataInputStream.readLong(DataInputStream.java:399)
      at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTrack
      er.java:1643)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
      at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427
      )
      at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicati
      onHandler.java:475)
      at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:5
      67)
      at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
      at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplication
      Context.java:635)
      at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
      at org.mortbay.http.HttpServer.service(HttpServer.java:954)
      at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
      at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
      at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
      at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:
      244)
      at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
      at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

      In this case, the task tracker cannot send out the map outut files on that machine, rendering it useless.
      Moreover, all the reduces depending on those map output files are just stuck there.
      If the task tracker reports fail to the job tracker, the map/reduce job can recover.
      If the task tracker restarted, it can continue to join the cluster as a new mamber.

      1. 1179.patch
        3 kB
        Devaraj Das

        Activity

        Runping Qi created issue -
        Runping Qi made changes -
        Field Original Value New Value
        Description
        Due to some errors (mem leak?), the jetty http server throws outOfMemory exception when serving get-map-output requests:

        2007-03-28 12:28:06,642 WARN org.apache.hadoop.mapred.TaskRunner: task_0334_r_00
        0379_0 Intermediate Merge of the inmemory files threw an exception: java.lang.Ou
        tOfMemoryError
                at java.io.FileOutputStream.writeBytes(Native Method)
                at java.io.FileOutputStream.write(FileOutputStream.java:260)
                at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write
        (RawLocalFileSystem.java:166)
                at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOut
        putStream.java:38)
                at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65
        )
                at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
                at java.io.DataOutputStream.write(DataOutputStream.java:90)
                at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.write(Checksum
        FileSystem.java:391)
                at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOut
        putStream.java:38)
                at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65
        )
                at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
                at java.io.DataOutputStream.write(DataOutputStream.java:90)
                at org.apache.hadoop.io.SequenceFile$CompressedBytes.writeCompressedByte
        s(SequenceFile.java:492)
                at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.appendRaw(Sequ
        enceFile.java:903)
                at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:
        2227)
                at org.apache.hadoop.mapred.ReduceTaskRunner$InMemFSMergeThread.run(Redu
        ceTaskRunner.java:838)

        In this case, the task tracker cannot send out the map outut files on that machine, rendering it useless.
        Moreover, all the reduces depending on those map output files are just stuck there.
        If the task tracker reports fail to the job tracker, the map/reduce job can recover.
        If the task tracker restarted, it can continue to join the cluster as a new mamber.


        Due to some errors (mem leak?), the jetty http server throws outOfMemory exception when serving get-map-output requests:

        2007-03-28 20:42:39,608 WARN org.mortbay.jetty.servlet.ServletHandler: Error for
         /mapOutput?map=task_0334_m_013127_0&reduce=591
        2007-03-28 20:46:42,788 WARN org.mortbay.jetty.servlet.ServletHandler: Error for
         /mapOutput?map=task_0334_m_013127_0&reduce=591
        2007-03-28 20:49:38,064 WARN org.mortbay.jetty.servlet.ServletHandler: Error for
        java.lang.OutOfMemoryError
                at java.io.FileInputStream.readBytes(Native Method)
                at java.io.FileInputStream.read(FileInputStream.java:199)
                at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(R
        awLocalFileSystem.java:119)
                at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInput
        Stream.java:41)
                at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
                at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
                at java.io.DataInputStream.read(DataInputStream.java:132)
                at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(Che
        cksumFileSystem.java:182)
                at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumF
        ileSystem.java:167)
                at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInput
        Stream.java:41)
                at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
                at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
                at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
                at java.io.DataInputStream.readFully(DataInputStream.java:178)
                at java.io.DataInputStream.readLong(DataInputStream.java:399)
                at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTrack
        er.java:1643)
                at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
                at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
                at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427
        )
                at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicati
        onHandler.java:475)
                at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:5
        67)
                at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
                at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplication
        Context.java:635)
                at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
                at org.mortbay.http.HttpServer.service(HttpServer.java:954)
                at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
                at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
                at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
                at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:
        244)
                at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
                at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

        In this case, the task tracker cannot send out the map outut files on that machine, rendering it useless.
        Moreover, all the reduces depending on those map output files are just stuck there.
        If the task tracker reports fail to the job tracker, the map/reduce job can recover.
        If the task tracker restarted, it can continue to join the cluster as a new mamber.

        Devaraj Das made changes -
        Resolution Duplicate [ 3 ]
        Status Open [ 1 ] Resolved [ 5 ]
        Devaraj Das made changes -
        Assignee Devaraj Das [ devaraj ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Resolution Duplicate [ 3 ]
        Devaraj Das made changes -
        Attachment 1179.patch [ 12354517 ]
        Nigel Daley made changes -
        Fix Version/s 0.12.3 [ 12312403 ]
        Owen O'Malley made changes -
        Status Reopened [ 4 ] Patch Available [ 10002 ]
        Tom White made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Devaraj Das
            Reporter:
            Runping Qi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development