Description
Recently we observed below error message on client side when put process blocked:
2016-04-26 10:27:11,707 ERROR [Sink: Unnamed (1/1)] com.alibaba.search.blink.streaming.connector.hbase.HBaseOutputFormat: Doing mutation failed org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException: 1 time, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:208) at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1694) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:208) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)
And checking RS logs we found nothing noticable. After adding some logging to show the detailed exception, it turns out something went wrong in one of our coprocessors:
2016-04-26 12:03:13,776 ERROR [Sink: Unnamed (1/1)] org.apache.hadoop.hbase.client.AsyncProcess: Exception occurred! Exception details: [java.io.IOException: java.io.IOException: notify meta has not load success. at com.taobao.kart.coprocessor.server.CoprocessorNotifyMeta.checkMetaLoadSuccess(CoprocessorNotifyMeta.java:38) at com.taobao.kart.coprocessor.server.NotifyQualifySetter.updatePut(NotifyQualifySetter.java:47) at com.taobao.kart.coprocessor.server.NotifyQualifySetter.updatePut(NotifyQualifySetter.java:39) at com.taobao.kart.coprocessor.server.KartCoprocessor.prePut(KartCoprocessor.java:176) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$30.call(RegionCoprocessorHost.java:902) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1748) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1705) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.prePut(RegionCoprocessorHost.java:898) at org.apache.hadoop.hbase.regionserver.HRegion.doPreMutationHook(HRegion.java:2890) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2865)
So in this JIRA we propose to add a property to allow logging detailed exception stacktrace rather than statistics for batch errors, and I believe this would be useful for debugging in some cases.