[SPARK-10287] After processing a query using JSON data, Spark SQL continuously refreshes metadata of the table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.5.0
Fix Version/s: 1.5.0
Component/s: SQL
Labels:
- releasenotes

Description

I have a partitioned json table with 1824 partitions.

val df = sqlContext.read.format("json").load("aPartitionedJsonData")
val columnStr = df.schema.map(_.name).mkString(",")
println(s"columns: $columnStr")
val hash = df
  .selectExpr(s"hash($columnStr) as hashValue")
  .groupBy()
  .sum("hashValue")
  .head()
  .getLong(0)

Looks like for JSON, we refresh metadata when we call buildScan. For a partitioned table, we call buildScan for every partition. So, looks like we will refresh this table 1824 times.

Attachments

Issue Links

links to

[Github] Pull Request #8469 (yhuai)

Activity

People

Assignee:: Yin Huai

Reporter:: Yin Huai

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/Aug/15 04:57

Updated:: 30/Aug/15 01:25

Resolved:: 27/Aug/15 23:12