XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • QuorumJournalManager (HDFS-3077)
    • None
    • None
    • Reviewed

    Description

      In doing some stress tests, I ran into an issue with failover if the current edit log segment written by the old active is large. With a 327MB log segment containing 6.4M transactions, the JN took ~11 seconds to read and validate it during the recovery step. This was longer than the 10 second timeout for createNewEpoch, which caused the recovery to fail.

      Attachments

        1. hdfs-3906.txt
          24 kB
          Todd Lipcon

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: