Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-5058

QJM should validate startLogSegment() more strictly

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0, 2.1.0-beta
    • Fix Version/s: None
    • Component/s: qjm
    • Labels:
      None

      Description

      We've seen a small handful of times a case where one of the NNs in an HA cluster ends up with an fsimage checkpoint that falls in the middle of an edit segment. We're not sure yet how this happens, but one issue can happen as a result:

      • Node has fsimage_500. Cluster has edits_1-1000, edits_1001_inprogress
      • Node restarts, loads fsimage_500
      • Node wants to become active. It calls selectInputStreams(500). Currently, this API logs a WARN that 500 falls in the middle of the 1-1000 segment, but continues and returns no results.
      • Node calls startLogSegment(501).

      Currently, the QJM will accept this (incorrectly). The node then crashes when it first tries to journal a real transaction, but it ends up leaving the edits_501_inprogress lying around, potentially causing more issues later.

      1. hdfs-5058.txt
        6 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        Fixing this at the QJM side is pretty easy - just need to add a few more checks.

        We should also re-evaluate the selectInputStreams() API when called in the middle of a segment. Perhaps it should return the full segment, and fast-forward into it to the correct transaction? That would have also helped this. But, either way, the extra sanity checks are valuable.

        Show
        Todd Lipcon added a comment - Fixing this at the QJM side is pretty easy - just need to add a few more checks. We should also re-evaluate the selectInputStreams() API when called in the middle of a segment. Perhaps it should return the full segment, and fast-forward into it to the correct transaction? That would have also helped this. But, either way, the extra sanity checks are valuable.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12595650/hdfs-5058.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4759//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4759//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595650/hdfs-5058.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4759//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4759//console This message is automatically generated.
        Hide
        Fengdong Yu added a comment -

        I suppose you restart hdfs on the standby NN, right?

        Show
        Fengdong Yu added a comment - I suppose you restart hdfs on the standby NN, right?
        Hide
        Todd Lipcon added a comment -

        Yep, the problem occurs if you restart the SBN and then try to transition it to active after you've loaded an fsimage that fell in the middle of a log segment.

        Show
        Todd Lipcon added a comment - Yep, the problem occurs if you restart the SBN and then try to transition it to active after you've loaded an fsimage that fell in the middle of a log segment.
        Hide
        Fengdong Yu added a comment -

        Todd,
        I looked at the patch, which checked more condition, and throw Exceptions. I just want to advice:

        even if we add more check and throw releated Exceptions, but if a new Administrator restart the SBN and face to these Exception message, he/she don't know how to do, just know SBN cannot start normally, or cannot transit to Active.

        so can you set an boolean Exception flag, which is false initially, then each check, just set this flag to true, don't throw Exception. then throw one Exception message finally after all check finished. and add an additional explanation in the Exception message, "please copy

        {namenode.name.dir}

        /* from Active NN to the Standby NN to solve this problem."

        Show
        Fengdong Yu added a comment - Todd, I looked at the patch, which checked more condition, and throw Exceptions. I just want to advice: even if we add more check and throw releated Exceptions, but if a new Administrator restart the SBN and face to these Exception message, he/she don't know how to do, just know SBN cannot start normally, or cannot transit to Active. so can you set an boolean Exception flag, which is false initially, then each check, just set this flag to true, don't throw Exception. then throw one Exception message finally after all check finished. and add an additional explanation in the Exception message, "please copy {namenode.name.dir} /* from Active NN to the Standby NN to solve this problem."
        Hide
        Todd Lipcon added a comment -

        Hi Fengdong. I think making the user experience of broken setups is a different task than this JIRA, which is just a bug fix. I don't want to scope creep this, since it's an important fix for data safety.

        Additionally, always telling the admin to copy the data dir between nodes is dangerous – once we're in an inconsistent state, an expert should really look at it to understand the correct recovery. Giving resolution advice in an error message is risky, since we're already in a bad state we may end up giving the wrong advice.

        Show
        Todd Lipcon added a comment - Hi Fengdong. I think making the user experience of broken setups is a different task than this JIRA, which is just a bug fix. I don't want to scope creep this, since it's an important fix for data safety. Additionally, always telling the admin to copy the data dir between nodes is dangerous – once we're in an inconsistent state, an expert should really look at it to understand the correct recovery. Giving resolution advice in an error message is risky, since we're already in a bad state we may end up giving the wrong advice.
        Hide
        Fengdong Yu added a comment -

        Yes, Todd, I agree with you abosulotely.

        to slove this problem finally, we should sync all transactions on the ative NN to the SBN during shut down HDFS, right?
        so can we open another JIRA for it?

        Show
        Fengdong Yu added a comment - Yes, Todd, I agree with you abosulotely. to slove this problem finally, we should sync all transactions on the ative NN to the SBN during shut down HDFS, right? so can we open another JIRA for it?
        Hide
        Todd Lipcon added a comment -

        HDFS-5074 explains the way in which we end up with a mid-segment checkpoint and should also solve this issue – it will return the 1-1000 segment from selectInputStreams and properly read it at startup. But, this fix is still good to add as an extra safety guard.

        Show
        Todd Lipcon added a comment - HDFS-5074 explains the way in which we end up with a mid-segment checkpoint and should also solve this issue – it will return the 1-1000 segment from selectInputStreams and properly read it at startup. But, this fix is still good to add as an extra safety guard.

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development