HBase
  1. HBase
  2. HBASE-5323

Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.90.5
    • Fix Version/s: 0.90.8
    • Component/s: None
    • Labels:
      None

      Description

      We know that while parsing the HLog we expect the proper length from HDFS.
      In WALReaderFSDataInputStream

                    assert(realLength >= this.length);
      

      We are trying to come out if the above condition is not satisfied. But if SSH.splitLog() gets this problem then it lands in the run method of EventHandler. This kills the SSH thread and so further assignment does not happen. If ROOT and META are to be assigned they cannot be.
      I think in this condition we abort the master by catching such exceptions.
      Please do suggest.

      1. HBASE-5323.patch
        9 kB
        ramkrishna.s.vasudevan
      2. HBASE-5323.patch
        0.9 kB
        ramkrishna.s.vasudevan

        Activity

        Hide
        Lars Hofhansl added a comment -

        Removing from 0.94 since there is no movement.

        Show
        Lars Hofhansl added a comment - Removing from 0.94 since there is no movement.
        Hide
        Lars Hofhansl added a comment -

        On to the greener pastures of 0.94.4

        Show
        Lars Hofhansl added a comment - On to the greener pastures of 0.94.4
        Hide
        Lars Hofhansl added a comment -

        Moving out to 0.94.3.
        @Ram: So you're seeing this in production or in a test? Are you running with assertions enabled in production?

        Show
        Lars Hofhansl added a comment - Moving out to 0.94.3. @Ram: So you're seeing this in production or in a test? Are you running with assertions enabled in production?
        Hide
        Jonathan Hsieh added a comment -

        If it isn't making 0.94.1 then I'm going to bump it from 0.90.7.

        Show
        Jonathan Hsieh added a comment - If it isn't making 0.94.1 then I'm going to bump it from 0.90.7.
        Hide
        Lars Hofhansl added a comment -

        Thanks Ram... Let's try for the next release.
        I'll leave it to Jon whether he wants this in 0.90.7.

        Show
        Lars Hofhansl added a comment - Thanks Ram... Let's try for the next release. I'll leave it to Jon whether he wants this in 0.90.7.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Currently am not working on this Lars. You need not include in RC. Thanks

        Show
        ramkrishna.s.vasudevan added a comment - Currently am not working on this Lars. You need not include in RC. Thanks
        Hide
        Lars Hofhansl added a comment -

        Are you still working on this Ram? Do you think this needs to go into 0.94?

        Show
        Lars Hofhansl added a comment - Are you still working on this Ram? Do you think this needs to go into 0.94?
        Hide
        Ted Yu added a comment -

        I would suggest starting with a patch for TRUNK which can be verified by Hadoop QA.
        Once code review passes, you can backport to 0.90 branch.

        Show
        Ted Yu added a comment - I would suggest starting with a patch for TRUNK which can be verified by Hadoop QA. Once code review passes, you can backport to 0.90 branch.
        Hide
        Ted Yu added a comment -

        Some comments for patch v2.
        In MasterFileSystem.java, the following check should be put as first in if statement because it is fast:

        e instanceof HLogLengthMisMatchException
        

        In HLog.java:

        +      if (e instanceof HLogLengthMisMatchException) {
        +        throw new HLogLengthMisMatchException(
        

        I don't think we need the above. Instead, we should check for e.cause in MasterFileSystem.java

        For HLogLengthMisMatchException.java, the M in Match should be lower case.
        I think it should be a generic exception. So maybe rename it to FileLengthMismatchException ?

        For InstrumentedSequenceLogReader, please add javadoc for the class.
        This class should be generic as well. How about passing the exception that getPos() is supposed to throw to ctor of InstrumentedSequenceLogReader ?

        Show
        Ted Yu added a comment - Some comments for patch v2. In MasterFileSystem.java, the following check should be put as first in if statement because it is fast: e instanceof HLogLengthMisMatchException In HLog.java: + if (e instanceof HLogLengthMisMatchException) { + throw new HLogLengthMisMatchException( I don't think we need the above. Instead, we should check for e.cause in MasterFileSystem.java For HLogLengthMisMatchException.java, the M in Match should be lower case. I think it should be a generic exception. So maybe rename it to FileLengthMismatchException ? For InstrumentedSequenceLogReader, please add javadoc for the class. This class should be generic as well. How about passing the exception that getPos() is supposed to throw to ctor of InstrumentedSequenceLogReader ?
        Hide
        ramkrishna.s.vasudevan added a comment -

        Patch for 0.90

        Show
        ramkrishna.s.vasudevan added a comment - Patch for 0.90
        Hide
        Lars Hofhansl added a comment -

        +1 on aborting the Master when this happens
        +1 on changing the assert to try/catch.

        Generally assert should only be used with conditions that we can live with in a production environment (as Todd points out).

        Show
        Lars Hofhansl added a comment - +1 on aborting the Master when this happens +1 on changing the assert to try/catch. Generally assert should only be used with conditions that we can live with in a production environment (as Todd points out).
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Todd

        Ok.. I can update the patch accordingly. I agree with your suggestion.

        Show
        ramkrishna.s.vasudevan added a comment - @Todd Ok.. I can update the patch accordingly. I agree with your suggestion.
        Hide
        Todd Lipcon added a comment -

        If this is an actual error condition that can be seen in practice (rather than indication of a failed programmer assumption) it should be changed to an if/throw, not an assert. Most users don't have asserts enabled in production.

        Show
        Todd Lipcon added a comment - If this is an actual error condition that can be seen in practice (rather than indication of a failed programmer assumption) it should be changed to an if/throw, not an assert. Most users don't have asserts enabled in production.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Patch for 0.90.

        Show
        ramkrishna.s.vasudevan added a comment - Patch for 0.90.
        Hide
        Ted Yu added a comment -

        Suggested approach makes sense.

        Show
        Ted Yu added a comment - Suggested approach makes sense.

          People

          • Assignee:
            ramkrishna.s.vasudevan
            Reporter:
            ramkrishna.s.vasudevan
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development