Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32380

sparksql cannot access hive table while data in hbase

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.2.3, 3.3.2
    • SQL
    • None

    Description

      • step1: create hbase table
       hbase(main):001:0>create 'hbase_test1', 'cf1'
       hbase(main):001:0> put 'hbase_test', 'r1', 'cf1:c1', '123'
      
      • step2: create hive table related to hbase table

       

      hive> 
      CREATE EXTERNAL TABLE `hivetest.hbase_test`(
        `key` string COMMENT '', 
        `value` string COMMENT '')
      ROW FORMAT SERDE 
        'org.apache.hadoop.hive.hbase.HBaseSerDe' 
      STORED BY 
        'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
      WITH SERDEPROPERTIES ( 
        'hbase.columns.mapping'=':key,cf1:v1', 
        'serialization.format'='1')
      TBLPROPERTIES (
        'hbase.table.name'='hbase_test')
       
      • step3: sparksql query hive table while data in hbase
      spark-sql --master yarn -e "select * from hivetest.hbase_test"
      

       

      The error log as follow: 

      java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details.
      at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:270)
      at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:131)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
      at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
      at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
      at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385)
      at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:412)
      at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:58)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65)
      at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
      at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
      at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
      at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
      at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
      at scala.collection.Iterator.foreach(Iterator.scala:941)
      at scala.collection.Iterator.foreach$(Iterator.scala:941)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
      at scala.collection.IterableLike.foreach(IterableLike.scala:74)
      at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
      at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
      at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
      at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:206)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
      at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
      at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
      at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
      at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
      at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeTable either in your constructor or initialize method
      at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:652)
      at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:265)
      ... 62 more
      java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details.
      at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:270)
      at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:131)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
      at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
      at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
      at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385)
      at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:412)
      at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:58)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65)
      at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
      at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
      at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
      at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
      at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
      at scala.collection.Iterator.foreach(Iterator.scala:941)
      at scala.collection.Iterator.foreach$(Iterator.scala:941)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
      at scala.collection.IterableLike.foreach(IterableLike.scala:74)
      at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
      at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
      at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
      at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:206)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
      at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
      at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
      at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
      at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
      at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeTable either in your constructor or initialize method
      at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:652)
      at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:265)
      ... 62 more

      Attachments

        Issue Links

          Activity

            People

              attilapiros Attila Zsolt Piros
              meimile deyzhong
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 72h
                  72h
                  Remaining:
                  Remaining Estimate - 72h
                  72h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified