Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-5500

Critical datanode threads may terminate silently on uncaught exceptions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      We've seen refreshUsed (DU) thread disappearing on uncaught exceptions. This can go unnoticed for a long time. If OOM occurs, more things can go wrong. On one occasion, Timer, multiple refreshUsed and DataXceiverServer thread had terminated.

      DataXceiverServer catches OutOfMemoryError and sleeps for 30 seconds, but I am not sure it is really helpful. In once case, the thread did it multiple times then terminated. I suspect another OOM was thrown while in a catch block. As a result, the server socket was not closed and clients hung on connect. If it had at least closed the socket, client-side would have been impacted less.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kihwal Kihwal Lee
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated: