[SPARK-17398] Failed to query on external JSon Partitioned table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.4.5, 3.0.0
Component/s: SQL
Labels:
None

Description

1. Create External Json partitioned table
with SerDe in hive-hcatalog-core-1.2.1.jar, download fom
https://mvnrepository.com/artifact/org.apache.hive.hcatalog/hive-hcatalog-core/1.2.1
2. Query table meet exception, which works in spark1.5.2
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task
0.0 in stage 1.0 (TID 1, localhost): java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.hive.hcatalog.data.HCatRecord
at org.apache.hive.hcatalog.data.HCatRecordObjectInspector.getStructFieldData(HCatRecordObjectInspector.java:45)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:430)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:426)

3. Test Code

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext

object JsonBugs {

def main(args: Array[String]): Unit = {
val table = "test_json"
val location = "file:///g:/home/test/json"
val create = s"""CREATE EXTERNAL TABLE ${table}
(id string, seq string )
PARTITIONED BY(index int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION "${location}"
"""
val add_part = s"""
ALTER TABLE ${table} ADD
PARTITION (index=1)LOCATION '${location}/index=1'
"""

val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
val ctx = new SparkContext(conf)

val hctx = new HiveContext(ctx)
val exist = hctx.tableNames().map

{ x => x.toLowerCase() }

.contains(table)
if (!exist)

{ hctx.sql(create) hctx.sql(add_part) }

else

{ hctx.sql("show partitions " + table).show() }

hctx.sql("select * from test_json").show()
}
}

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

screenshot-1.png
28/Jun/19 04:03
32 kB
bianqi

Issue Links

relates to

HIVE-21752 Thread Safety and Memory Leaks in HCatRecordObjectInspectorFactory

Closed

HIVE-15773 HCatRecordObjectInspectorFactory is not thread safe

Patch Available

links to

GitHub Pull Request #26895

Activity

People

Assignee:: Wing Yew Poon

Reporter:: pin_zhang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 05/Sep/16 05:27

Updated:: 20/Dec/19 23:23

Resolved:: 20/Dec/19 18:53