Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-285

Data nodes cannot re-join the cluster once connection is lost

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.3.0
    • 0.3.2
    • None
    • None

    Description

      A data node looses connection to a name node and then tries to offerService() again.
      HADOOP-270 changes force it to start dataXceiveServer, which is already started and in this case
      throws IllegalThreadStateException, which goes on in a loop, and never reaches the heartbeat section.
      So the data node never re-joins the cluster, while from the out side it looks it's still running.
      This is another reason why we see missing data, and don't see failed data nodes.

      Attachments

        1. datanode.patch
          2 kB
          Hairong Kuang

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hairong Hairong Kuang
            shv Konstantin Shvachko
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment