HBase
  1. HBase
  2. HBASE-1858

Master can't split logs created by THBase

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.20.1, 0.90.0
    • Component/s: master
    • Labels:
      None

      Description

      When master tries to split logs created by THbase, it fails because it tries to read in the wrong key type. (THBase subclasses HLogKey to add fields to the key).

      2009-09-16 09:03:01,943 WARN org.apache.hadoop.hbase.regionserver.HLog:
      Exception processing
      hdfs://domU-12-31-39-07-CC-A2.compute-1.internal:9000/hbase/.logs/domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743/hlog.dat.1253103102168
      – continuing. Possible DATA LOSS!
      java.io.IOException: wrong key class:
      org.apache.hadoop.hbase.regionserver.HLogKey is not class
      org.apache.hadoop.hbase.regionserver.transactional.THLogKey
      at
      org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1824)
      at
      org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1876)
      at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:880)
      at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:802)
      at
      org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:274)
      at
      org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:492)
      at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:426)

      1. 1858.patch
        21 kB
        Clint Morgan
      2. 1858-v2.patch
        23 kB
        Clint Morgan

        Activity

        Hide
        James Kennedy added a comment -

        Well the THLog.class does overwrite the read/write serialization methods to add it's own fields. So if you've already serialized HLogs you may run into trouble.

        I can say for sure though what will happen without digging further.
        I suggest you try without conversion and see what happens.
        Backup your data first though.

        Show
        James Kennedy added a comment - Well the THLog.class does overwrite the read/write serialization methods to add it's own fields. So if you've already serialized HLogs you may run into trouble. I can say for sure though what will happen without digging further. I suggest you try without conversion and see what happens. Backup your data first though.
        Hide
        George P. Stathis added a comment -

        James, do you know if I can make the hbase.regionserver.hlog.keyclass config change on a cluster that already has transactional tables with data in them or do I need some sort of data conversion process now?

        Show
        George P. Stathis added a comment - James, do you know if I can make the hbase.regionserver.hlog.keyclass config change on a cluster that already has transactional tables with data in them or do I need some sort of data conversion process now?
        Hide
        James Kennedy added a comment -

        You do need to specify the HLogKey instance in hour config:

        <property>
        <name>hbase.regionserver.hlog.keyclass</name>
        <value>org.apache.hadoop.hbase.regionserver.transactional.THLogKey</value>
        </property>

        I apologize for the poor documentation in the hbase-transactional-indexed project. It is not mature yet though we have significant updates going in soon that
        are compatible with HBase 0.89. Note that in that version you will NOT need to specify THLogKey in the config.

        Show
        James Kennedy added a comment - You do need to specify the HLogKey instance in hour config: <property> <name>hbase.regionserver.hlog.keyclass</name> <value>org.apache.hadoop.hbase.regionserver.transactional.THLogKey</value> </property> I apologize for the poor documentation in the hbase-transactional-indexed project. It is not mature yet though we have significant updates going in soon that are compatible with HBase 0.89. Note that in that version you will NOT need to specify THLogKey in the config.
        Hide
        George P. Stathis added a comment -

        Folks, unless there is something special that needs to be done when configuring transactional tables, it unfortunately seems that this problem is still occurring in 0.20.3:

        2010-08-27 15:53:06,030 DEBUG regionserver.HLog - Splitting hlog 2 of 31: hdfs://domU-12-31-39-18-19-65.compute-1.internal:9000/hbase/.logs/domU-12-31-39-18-15-24.compute-1.internal,60020,1282782970585/
        hlog.dat.1282798940318, length=9831249
        2010-08-27 15:53:06,033 DEBUG regionserver.HLog - IOE Pushed=0 entries from hdfs://domU-12-31-39-18-19-65.compute-1.internal:9000/hbase/.logs/domU-12-31-39-18-15-24.compute-1.internal,60020,128278297058
        5/hlog.dat.1282798940318
        2010-08-27 15:53:06,033 WARN  regionserver.HLog - Exception processing hdfs://domU-12-31-39-18-19-65.compute-1.internal:9000/hbase/.logs/domU-12-31-39-18-15-24.compute-1.internal,60020,1282782970585/hlo
        g.dat.1282798940318 -- continuing. Possible DATA LOSS!
        java.io.IOException: wrong key class: org.apache.hadoop.hbase.regionserver.HLogKey is not class org.apache.hadoop.hbase.regionserver.transactional.THLogKey
        	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1824)
        	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1876)
        	at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:966)
        	at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:872)
        	at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:286)
        	at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:494)
        	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:429)
        

        I'm not sure though that the "define the HLogKey class in the conf" part means, so I might be missing that...

        Show
        George P. Stathis added a comment - Folks, unless there is something special that needs to be done when configuring transactional tables, it unfortunately seems that this problem is still occurring in 0.20.3: 2010-08-27 15:53:06,030 DEBUG regionserver.HLog - Splitting hlog 2 of 31: hdfs://domU-12-31-39-18-19-65.compute-1.internal:9000/hbase/.logs/domU-12-31-39-18-15-24.compute-1.internal,60020,1282782970585/ hlog.dat.1282798940318, length=9831249 2010-08-27 15:53:06,033 DEBUG regionserver.HLog - IOE Pushed=0 entries from hdfs://domU-12-31-39-18-19-65.compute-1.internal:9000/hbase/.logs/domU-12-31-39-18-15-24.compute-1.internal,60020,128278297058 5/hlog.dat.1282798940318 2010-08-27 15:53:06,033 WARN regionserver.HLog - Exception processing hdfs://domU-12-31-39-18-19-65.compute-1.internal:9000/hbase/.logs/domU-12-31-39-18-15-24.compute-1.internal,60020,1282782970585/hlo g.dat.1282798940318 -- continuing. Possible DATA LOSS! java.io.IOException: wrong key class: org.apache.hadoop.hbase.regionserver.HLogKey is not class org.apache.hadoop.hbase.regionserver.transactional.THLogKey at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1824) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1876) at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:966) at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:872) at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:286) at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:494) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:429) I'm not sure though that the "define the HLogKey class in the conf" part means, so I might be missing that...
        Hide
        stack added a comment -

        Applied to branch. Closing. Thanks for the patch Clint.

        Show
        stack added a comment - Applied to branch. Closing. Thanks for the patch Clint.
        Hide
        stack added a comment -

        Backporting Jon's fixes to TestTHLogRecover in trunk where he updates to new API seems to make it pass the test. I'm running all tests now.

        Show
        stack added a comment - Backporting Jon's fixes to TestTHLogRecover in trunk where he updates to new API seems to make it pass the test. I'm running all tests now.
        Hide
        Andrew Purtell added a comment -

        Committed to trunk only. TestTHLogRecovery fails for me on 0.20:

        Testcase: testWithoutFlush took 30.077 sec
        Caused an ERROR
        java.io.IOException: Only Puts in BU as of 0.20.0
        java.lang.RuntimeException: java.io.Exception: Only Puts in BU as of 0.20.0
        at org.apache.hadoop.hbase.client.transactional.HBaseBackedTransactionLogger.forgetTransaction
        (HBaseBackedTransactionLogger.java:136)
        at org.apache.hadoop.hbase.client.transactional.TransactionManager.doCommit
        (TransactionManager.java:192)
        at org.apache.hadoop.hbase.client.transactional.TransactionManager.tryCommit
        (TransactionManager.java:154)
        at org.apache.hadoop.hbase.regionserver.transactional.TestTHLogRecovery.testWithoutFlush
        (TestTHLogRecovery.java:118)
        
        Show
        Andrew Purtell added a comment - Committed to trunk only. TestTHLogRecovery fails for me on 0.20: Testcase: testWithoutFlush took 30.077 sec Caused an ERROR java.io.IOException: Only Puts in BU as of 0.20.0 java.lang.RuntimeException: java.io.Exception: Only Puts in BU as of 0.20.0 at org.apache.hadoop.hbase.client.transactional.HBaseBackedTransactionLogger.forgetTransaction (HBaseBackedTransactionLogger.java:136) at org.apache.hadoop.hbase.client.transactional.TransactionManager.doCommit (TransactionManager.java:192) at org.apache.hadoop.hbase.client.transactional.TransactionManager.tryCommit (TransactionManager.java:154) at org.apache.hadoop.hbase.regionserver.transactional.TestTHLogRecovery.testWithoutFlush (TestTHLogRecovery.java:118)
        Hide
        Andrew Purtell added a comment -

        +1 will commit if all tests pass.

        Show
        Andrew Purtell added a comment - +1 will commit if all tests pass.
        Hide
        Clint Morgan added a comment -

        Forgot to update the test. All tests now pass.

        Show
        Clint Morgan added a comment - Forgot to update the test. All tests now pass.
        Hide
        Clint Morgan added a comment -

        This patch clean up the transaction WALing, and makes it work.

        • Introduce config property "hbase.regionserver.hlog.keyclass" which is used to instantiate keys.
        • When we commit a transaction, we let the puts/deletes go into the normal WAL. This way, they are handled normally during recovery. The only time we need to do special WAL recovery in the transactional layer is when we find a transaction that started, but does not have an commit or abort message. In this case, its status should be in the "global" trx log.
        • Clean up the use of the "global" transaction log. This holds the state of a transaction while the transaction is still in-process. This state can be forgotten after a successful commit/abort. This state is only used when we recover from the WAL and don't know what actually happened to the transaction.
        • Ports HbaseBackTransactionalLogger to the new API.
        • fixes a bug: when multiple puts in the same transaction to the same cell, make the last put should be used for a trx-local get.

        I tested the WAL recovery and it works for me.

        Show
        Clint Morgan added a comment - This patch clean up the transaction WALing, and makes it work. Introduce config property "hbase.regionserver.hlog.keyclass" which is used to instantiate keys. When we commit a transaction, we let the puts/deletes go into the normal WAL. This way, they are handled normally during recovery. The only time we need to do special WAL recovery in the transactional layer is when we find a transaction that started, but does not have an commit or abort message. In this case, its status should be in the "global" trx log. Clean up the use of the "global" transaction log. This holds the state of a transaction while the transaction is still in-process. This state can be forgotten after a successful commit/abort. This state is only used when we recover from the WAL and don't know what actually happened to the transaction. Ports HbaseBackTransactionalLogger to the new API. fixes a bug: when multiple puts in the same transaction to the same cell, make the last put should be used for a trx-local get. I tested the WAL recovery and it works for me.
        Hide
        Clint Morgan added a comment -

        I have a fix for this (define the HLogKey class in the conf), but I've found some other issues with restoring from trx log that I'm working on.

        Should have something by tomorrow.

        Show
        Clint Morgan added a comment - I have a fix for this (define the HLogKey class in the conf), but I've found some other issues with restoring from trx log that I'm working on. Should have something by tomorrow.
        Hide
        stack added a comment -

        Do you have a fix for this Clint?

        Show
        stack added a comment - Do you have a fix for this Clint?

          People

          • Assignee:
            Clint Morgan
            Reporter:
            Clint Morgan
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development