Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Found this running TestReplication:

      2010-12-15 17:58:33,639 DEBUG [MASTER_SERVER_OPERATIONS-h17.sfo.stumble.net:58644-0] wal.HLogSplitter(299)
      : Closed hdfs://localhost:58631/user/jdcryans/test/211477a0a924abda419b5579c7a83452/recovered.edits/0000000000000000002
      2010-12-15 17:58:33,642 ERROR [MASTER_SERVER_OPERATIONS-h17.sfo.stumble.net:58644-0] master.MasterFileSystem(197):
       Failed splitting hdfs://localhost:58631/user/jdcryans/.logs/h17.sfo.stumble.net,58647,1292464631034
      java.io.IOException: Discovered orphan hlog after split. Maybe HRegionServer was not dead when we started
              at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:290)
              at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:151)
              at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:193)
              at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:96)
              at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:680)
      2010-12-15 17:58:33,686 INFO  [MASTER_SERVER_OPERATIONS-h17.sfo.stumble.net:58644-0] handler.ServerShutdownHandler(144):
       Reassigning 8 region(s) that h17.sfo.stumble.net,58647,1292464631034 was carrying (skipping 0 regions(s) that are already in transition)
      

      What I see is that there was an orphan HLog, but the exception was eaten in MasterFileSystem.splitLog (it just logs as an error) and then it proceeds to reassign the regions. There is potential data loss.

      Another bad side effect is that those HLogs never get archived, and stay in .logs

      1. HBASE-3367.patch
        4 kB
        Jean-Daniel Cryans

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        1d 18h 34m 1 Jean-Daniel Cryans 18/Dec/10 00:17
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #1697 (See https://hudson.apache.org/hudson/job/HBase-TRUNK/1697/)

        Show
        Hudson added a comment - Integrated in HBase-TRUNK #1697 (See https://hudson.apache.org/hudson/job/HBase-TRUNK/1697/ )
        Jean-Daniel Cryans made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Jean-Daniel Cryans added a comment -

        Committed to branch and trunk.

        Show
        Jean-Daniel Cryans added a comment - Committed to branch and trunk.
        Hide
        stack added a comment -

        +1

        If it don't work on single retry, then it deserves to fail I'd say. Patch looks good J-D.

        Show
        stack added a comment - +1 If it don't work on single retry, then it deserves to fail I'd say. Patch looks good J-D.
        Jean-Daniel Cryans made changes -
        Field Original Value New Value
        Attachment HBASE-3367.patch [ 12466435 ]
        Hide
        Jean-Daniel Cryans added a comment -

        Hackish fix, not sure how handlers are supposed to be retried so instead I retry once when catching the exception.

        Show
        Jean-Daniel Cryans added a comment - Hackish fix, not sure how handlers are supposed to be retried so instead I retry once when catching the exception.
        Jean-Daniel Cryans created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Jean-Daniel Cryans
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development