HBase
  1. HBase
  2. HBASE-6652

[replication]replicationQueueSizeCapacity and replicationQueueNbCapacity default value is too big, Slave regionserver maybe outmemory after master start replication

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.94.1
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None

      Description

      now our replication replicationQueueSizeCapacity is set to 64M and replicationQueueNbCapacity is set to 25000. So when a master cluster with many regionserver replicate to a small cluster 。 Slave rpc queue will full and out of memory .

      java.util.concurrent.ExecutionException: java.io.IOException: Call queue is full, is ipc.server.max.callqueue.size too small?
      at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
      at java.util.concurrent.FutureTask.get(FutureTask.java:83)
      at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:
      1524)
      at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1376)
      at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:700)
      at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.batch(HTablePool.java:361)
      at org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:172)
      at org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:129)
      at org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:139)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.replicateLogEntries(HRegionServer.java:4018)
      at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:361)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1414)

        Activity

        Hide
        terry zhang added a comment -

        another case will case slave region server oom is master disable replication and restart many times. When we enable replication master region server will start many recovery thread (many zk node in replication/rs/xxx/). this will still let the slave rs work in very heavy load.

        Show
        terry zhang added a comment - another case will case slave region server oom is master disable replication and restart many times. When we enable replication master region server will start many recovery thread (many zk node in replication/rs/xxx/). this will still let the slave rs work in very heavy load.
        Hide
        terry zhang added a comment -

        if we use patch HBASE-6165 and don't set custom queue size , replication will us IPC call queue. So if hbase.region.server.handler.count set too much, the slave cluster region server maybe out of memory when replication running. So can we replicationQueueSizeCapacity default value to 4M?

        Show
        terry zhang added a comment - if we use patch HBASE-6165 and don't set custom queue size , replication will us IPC call queue. So if hbase.region.server.handler.count set too much, the slave cluster region server maybe out of memory when replication running. So can we replicationQueueSizeCapacity default value to 4M?
        Hide
        stack added a comment -

        Should we set default replicationQueueNbCapacity smaller? (Replicating from a too big cluster into a too small one seems to be a pretty common affair)

        Show
        stack added a comment - Should we set default replicationQueueNbCapacity smaller? (Replicating from a too big cluster into a too small one seems to be a pretty common affair)

          People

          • Assignee:
            terry zhang
            Reporter:
            terry zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development