Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15060

namenode doesn't retry JN when other JN goes down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.1.1
    • None
    • namenode
    • None

    Description

      When I upgrade hadoop to new version (using for ex. https://hadoop.apache.org/docs/r3.1.3/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#namenode_-rollingUpgrade as instruction) I've got a situation:

      I'm upgrading JN's one by one.

      1. Upgrade and restart JN1
      2. NN see JN offline:
        WARN client.QuorumJournalManager: Remote journal 10.73.67.132:8485 failed to write txns 1205396-1205399. Will try to write to this JN again after the next log roll.
      3. No log roll for some time (at least 1min)
      4. Upgrade and restart JN2
      5. NN see it again:
        WARN client.QuorumJournalManager: Remote journal 10.73.67.68:8485 failed to write txns 1205799-1205800. Will try to write to this JN again after the next log roll.
      6. BUT! At this time we have no JN quorum:
        FATAL namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.73.67.212:8485, 10.73.67.132:8485, 10.73.67.68:8485], stream=QuorumOutputStream starting at txid 1205246))
        10.73.67.212:8485: null [success]
        2 exceptions thrown:
        10.73.67.132:8485: Journal disabled until next roll
        10.73.67.68:8485: End of File Exception between local host is: "srv05.lt01.gismt.crpt.tech/10.73.67.132"; destination host is: "srv07.lt01.gismt.crpt.tech":8485; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
        although JN1 is online already

      It looks like NN should retry JN's marked as offline before giving up.

      Attachments

        Activity

          People

            Unassigned Unassigned
            atimonin Andrew Timonin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: