Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-1338

Spark can not query data when 'spark.carbon.hive.schema.store' is true

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • None
    • None

    Description

      My step is as blow:

       
      set spark.carbon.hive.schema.store=true in spark-defaults.conf
      spark-shell --jars carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
      import org.apache.spark.sql.SparkSession 
      import org.apache.spark.sql.CarbonSession._ 
      val rootPath = "hdfs://mycluster/user/master/carbon" 
      val storeLocation = s"$rootPath/store" 
      val warehouse = s"$rootPath/warehouse" 
      val metastoredb = s"$rootPath/metastore_db" 
      
      val carbon =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, metastoredb) 
      carbon.sql("create table temp.yuhai_carbon(id short, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") 
      carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv' INTO TABLE temp.yuhai_carbon") 
      carbon.sql("select * from temp.yuhai_carbon").show 
      

      Exception:

       
      Caused by: java.io.IOException: File does not exist: hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema 
        at org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) 
        at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) 
        at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) 
        at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) 
        at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) 
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) 
        at org.apache.spark.scheduler.Task.run(Task.scala:104) 
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
        at java.lang.Thread.run(Thread.java:745) 
      

      Attachments

        Activity

          People

            cenyuhai cen yuhai
            cenyuhai cen yuhai
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 5.5h
                5.5h