Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6107

Job history server becomes unresponsive due to stuck thread in epollWait

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.0
    • None
    • jobhistoryserver
    • None

    Description

      About once every week, we see job history server becomes unresponsive on one of our 2000 node hadoop cluster. Looking at the thread dump, I see that multiple threads are blocked on locks acquired by couple of threads, which in turn are endlessly stuck in epollWait while talking to hdfs to get a history file.
      When the number of blocked threads touches the thread pool size, JHS becomes unresponsive to new clients requests.
      Thread dump attached.

      Has anyone seen this before ?

      Here is the thread stuck at epollWait.

      "IPC Server handler 4 on 10020" daemon prio=10 tid=0x00007f7eb10f5000 nid=0x144d runnable [0x00007f7e9108d000]
         java.lang.Thread.State: RUNNABLE
              at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
              at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
              at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
              at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
              - locked <0x00000006c89d3240> (a sun.nio.ch.Util$2)
              - locked <0x00000006c89d3228> (a java.util.Collections$UnmodifiableSet)
              - locked <0x00000006bb32f8b8> (a sun.nio.ch.EPollSelectorImpl)
      

      Attachments

        1. jstack.log
          1.27 MB
          Ashwin Shankar

        Activity

          People

            Unassigned Unassigned
            ashwinshankar77 Ashwin Shankar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: