Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3232

Datanodes time out

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.16.2
    • 0.18.0
    • None
    • None
    • 10 node cluster + 1 namenode

    • Reviewed
    • Changed 'du' command to run in a seperate thread so that it does not block user.

    Description

      I recently upgraded to 0.16.2 from 0.15.2 on our 10 node cluster.
      Unfortunately we're seeing datanode timeout issues. In previous versions we've often seen in the nn webui that one or two datanodes "last contact" goes from the usual 0-3 sec to ~200-300 before it drops down to 0 again.

      This causes mild discomfort but the big problems appear when all nodes do this at once, as happened a few times after the upgrade.
      It was suggested that this could be due to namenode garbage collection, but looking at the gc log output it doesn't seem to be the case.

      Attachments

        1. hadoop-hadoop-datanode.out
          114 kB
          Johan Oskarsson
        2. hadoop-hadoop-namenode-master2.out
          25 kB
          Johan Oskarsson
        3. hadoop-hadoop-datanode-new.out
          60 kB
          Johan Oskarsson
        4. hadoop-hadoop-datanode-new.log
          6.15 MB
          Johan Oskarsson
        5. du-nonblocking-v1.patch
          8 kB
          Johan Oskarsson
        6. du-nonblocking-v2-trunk.patch
          4 kB
          Johan Oskarsson
        7. du-nonblocking-v4-trunk.patch
          6 kB
          Johan Oskarsson
        8. du-nonblocking-v5-trunk.patch
          6 kB
          Johan Oskarsson
        9. du-nonblocking-v6-trunk.patch
          7 kB
          Johan Oskarsson

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            johanoskarsson Johan Oskarsson
            johanoskarsson Johan Oskarsson
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment