HBase
  1. HBase
  2. HBASE-6533

[replication] replication will block if WAL compress set differently in master and slave configuration

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Duplicate
    • Affects Version/s: 0.94.0
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None

      Description

      as we know in hbase 0.94.0 we have a configuration below
      <property>
      <name>hbase.regionserver.wal.enablecompression</name>
      <value>true</value>
      </property>
      if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster.

      2012-08-09 12:49:55,892 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of an error
      on the remote cluster:
      java.io.IOException: IPC server unable to read call parameters: Error in readFields
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
      at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
      at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
      at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635)
      at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365)
      Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: Error in readFields
      at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921)
      at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151)
      at $Proxy13.replicateLogEntries(Unknown Source)
      at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616)
      ... 1 more

      This is because Slave cluster can not parse the hlog entry .

      2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.232.98.89
      java.io.IOException: Error in readFields
      at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685)
      at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586)
      at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635)
      at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      Caused by: java.io.EOFException
      at java.io.DataInputStream.readFully(DataInputStream.java:180)
      at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254)
      at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146)
      at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767)
      at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682)
      ... 11 more

      1. hbase-6533.patch
        0.9 kB
        terry zhang

        Issue Links

          Activity

          Hide
          terry zhang added a comment -

          so sorry create so many same issue because my IE issue. Could anyone help me delete them ?

          Show
          terry zhang added a comment - so sorry create so many same issue because my IE issue. Could anyone help me delete them ?
          Hide
          terry zhang added a comment -

          now we can only go around this issue is set the master back to uncompressed mode and delete the zk node replication/rs. And restart the master cluster. Cause replication slave don't support reading compress hlog.
          But if we have multi master and some of them set hlog to compressed mode. Them we can not handler this situation.

          Show
          terry zhang added a comment - now we can only go around this issue is set the master back to uncompressed mode and delete the zk node replication/rs. And restart the master cluster. Cause replication slave don't support reading compress hlog. But if we have multi master and some of them set hlog to compressed mode. Them we can not handler this situation.
          Hide
          terry zhang added a comment -

          this is because of master sending the hlog entry in compress mode. But Slave do not know about it. So when slave ipc hbaseserver deserilize the buffer and read the hlog entry fields error will happen. We can let the Master send the buffer in none compress mode. then whether master use hlog compression or not. Slave both can work fine

          Show
          terry zhang added a comment - this is because of master sending the hlog entry in compress mode. But Slave do not know about it. So when slave ipc hbaseserver deserilize the buffer and read the hlog entry fields error will happen. We can let the Master send the buffer in none compress mode. then whether master use hlog compression or not. Slave both can work fine
          Hide
          stack added a comment -

          Its a known issue that replication won't work w/ compressed WALs Terry. I'm not sure if we already have an issue for this so lets keep this one open.

          Show
          stack added a comment - Its a known issue that replication won't work w/ compressed WALs Terry. I'm not sure if we already have an issue for this so lets keep this one open.
          Show
          Jean-Daniel Cryans added a comment - terry zhang your solution is not complete, see https://issues.apache.org/jira/browse/HBASE-5778?focusedCommentId=13253995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13253995
          Hide
          terry zhang added a comment -

          yes,Daniel and Stack. Now replication can't work in hlog compress mode 。 cause compress mode need read hlog sequentially to construct the compressionContext dictionary . But when replication didn't read the entry in the hlog one by one(using Seek).So it can only get a tag(dictIdx) in the hlog. The original data is not exist in compressionContext. Usually we can get below error:

          java.lang.IndexOutOfBoundsException: index (2) must be less than size (1)
          at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
          at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
          at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
          at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
          at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
          at org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:111)
          at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:321)
          at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1851)
          at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891)
          at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:235)
          at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:206)
          at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:435)
          at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:311)

          Show
          terry zhang added a comment - yes,Daniel and Stack. Now replication can't work in hlog compress mode 。 cause compress mode need read hlog sequentially to construct the compressionContext dictionary . But when replication didn't read the entry in the hlog one by one(using Seek).So it can only get a tag(dictIdx) in the hlog. The original data is not exist in compressionContext. Usually we can get below error: java.lang.IndexOutOfBoundsException: index (2) must be less than size (1) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280) at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122) at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69) at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40) at org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:111) at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:321) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1851) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:235) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:206) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:435) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:311)
          Hide
          terry zhang added a comment -

          Can we disable hlog compress mode when we start Replication ?

          HRegionServer.java
               if (!conf.getBoolean(HConstants.REPLICATION_ENABLE_KEY, false)) {
                 return;
               }
          +    if (conf.getBoolean(HConstants.ENABLE_WAL_COMPRESSION, false)) {
          +      throw new RegionServerRunningException("Region server master cluster doesn't support" +
          +      "Hlog working in compression mode!");
          +    }
           
               // read in the name of the source replication class from the config file.
               String sourceClassname = conf.get(HConstants.REPLICATION_SOURCE_SERVICE_CLASSNAME,
          

          Or we need change the replication do not use seek when we read hlog in replication.do not close hlog again and again when we meet EOF exception. Which one is better?

          Show
          terry zhang added a comment - Can we disable hlog compress mode when we start Replication ? HRegionServer.java if (!conf.getBoolean(HConstants.REPLICATION_ENABLE_KEY, false )) { return ; } + if (conf.getBoolean(HConstants.ENABLE_WAL_COMPRESSION, false )) { + throw new RegionServerRunningException( "Region server master cluster doesn't support" + + "Hlog working in compression mode!" ); + } // read in the name of the source replication class from the config file. String sourceClassname = conf.get(HConstants.REPLICATION_SOURCE_SERVICE_CLASSNAME, Or we need change the replication do not use seek when we read hlog in replication.do not close hlog again and again when we meet EOF exception. Which one is better?
          Hide
          Jean-Daniel Cryans added a comment -

          Currently those two features are just incompatible, you use one or the other.

          Maybe we should add a check in HBaseConfiguration to make sure both aren't enabled, no need to throw the exception that deep in the code (and you'd have to do it inside WAL compression for replication too).

          In any case the real fix is described in HBASE-5778, the rest is just hacks.

          Show
          Jean-Daniel Cryans added a comment - Currently those two features are just incompatible, you use one or the other. Maybe we should add a check in HBaseConfiguration to make sure both aren't enabled, no need to throw the exception that deep in the code (and you'd have to do it inside WAL compression for replication too). In any case the real fix is described in HBASE-5778 , the rest is just hacks.
          Hide
          Michael Drzal added a comment -

          terry zhang I've cleaned up the duplicate issues for you.

          Show
          Michael Drzal added a comment - terry zhang I've cleaned up the duplicate issues for you.
          Hide
          Michael Drzal added a comment -

          Jean-Daniel Cryans should we just close this out since the real fix is HBASE-5778?

          Show
          Michael Drzal added a comment - Jean-Daniel Cryans should we just close this out since the real fix is HBASE-5778 ?
          Hide
          Lars Hofhansl added a comment -

          Closing as dup of HBASE-5778

          Show
          Lars Hofhansl added a comment - Closing as dup of HBASE-5778

            People

            • Assignee:
              terry zhang
              Reporter:
              terry zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development