Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17398

Failed to query on external JSon Partitioned table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.4.5, 3.0.0
    • SQL
    • None

    Description

      1. Create External Json partitioned table
      with SerDe in hive-hcatalog-core-1.2.1.jar, download fom
      https://mvnrepository.com/artifact/org.apache.hive.hcatalog/hive-hcatalog-core/1.2.1
      2. Query table meet exception, which works in spark1.5.2
      Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task
      0.0 in stage 1.0 (TID 1, localhost): java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.hive.hcatalog.data.HCatRecord
      at org.apache.hive.hcatalog.data.HCatRecordObjectInspector.getStructFieldData(HCatRecordObjectInspector.java:45)
      at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:430)
      at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:426)

      3. Test Code

      import org.apache.spark.SparkConf
      import org.apache.spark.SparkContext
      import org.apache.spark.sql.hive.HiveContext

      object JsonBugs {

      def main(args: Array[String]): Unit = {
      val table = "test_json"
      val location = "file:///g:/home/test/json"
      val create = s"""CREATE EXTERNAL TABLE ${table}
      (id string, seq string )
      PARTITIONED BY(index int)
      ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
      LOCATION "${location}"
      """
      val add_part = s"""
      ALTER TABLE ${table} ADD
      PARTITION (index=1)LOCATION '${location}/index=1'
      """

      val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
      conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
      val ctx = new SparkContext(conf)

      val hctx = new HiveContext(ctx)
      val exist = hctx.tableNames().map

      { x => x.toLowerCase() }

      .contains(table)
      if (!exist)

      { hctx.sql(create) hctx.sql(add_part) }

      else

      { hctx.sql("show partitions " + table).show() }

      hctx.sql("select * from test_json").show()
      }
      }

      Attachments

        1. screenshot-1.png
          32 kB
          bianqi

        Issue Links

          Activity

            People

              wypoon Wing Yew Poon
              pin_zhang pin_zhang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: