Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22769

Runtime Error on join (with filter) when using hbase-spark connector

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • connector-1.0.0
    • None
    • hbase-connectors
    • None
    • Built using maven scala plugin on intellij IDEA with Maven 3.3.9. Ran on Azure HDInsight Spark cluster using Yarn. 

      Spark version: 2.4.0

      Scala version: 2.11.12

      hbase-spark version: 1.0.0

    Description

      I am attempting to do a left outer join (though any join with a push down filter causes this issue) between a Spark Structured Streaming DataFrame and a DataFrame read from HBase. I get the following stack trace when running a simple spark app that reads from a streaming source and attempts to left outer join with a dataframe read from HBase:

      {{19/07/30 18:30:25 INFO DAGScheduler: ShuffleMapStage 1 (start at SparkAppTest.scala:88) failed in 3.575 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 10, wn5-edpspa.hnyo2upsdeau1bffc34wwrkgwc.ex.internal.cloudapp.net, executor 2): org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1609) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:1154) at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2967) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3301) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42002) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor15461.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1605) }}

      {{... 8 more }}

      {{Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/spark/datasources/JavaBytesEncoder$ at org.apache.hadoop.hbase.spark.datasources.JavaBytesEncoder.create(JavaBytesEncoder.scala) at org.apache.hadoop.hbase.spark.SparkSQLPushDownFilter.parseFrom(SparkSQLPushDownFilter.java:196) }}

      ... 12 more 

      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100) at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:359) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:347) at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:344) at org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:242) at org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:58) at org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:387) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:361) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

       

      It appears to be attempting to reference a file called "JavaBytesEncoder$.class" resulting in a NoClassDefFoundError. Interestingly, when I unzipped the jar I found that both "JavaBytesEncoder.class" and "JavaBytesEncoder$.class" exist, but the latter is simply an empty file. This might just be a case of me misunderstanding how Java links classes upon build however.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nbanholzer Noah Banholzer
              Votes:
              2 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated: