Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.5.0
Description
I have a partitioned json table with 1824 partitions.
val df = sqlContext.read.format("json").load("aPartitionedJsonData") val columnStr = df.schema.map(_.name).mkString(",") println(s"columns: $columnStr") val hash = df .selectExpr(s"hash($columnStr) as hashValue") .groupBy() .sum("hashValue") .head() .getLong(0)
Looks like for JSON, we refresh metadata when we call buildScan. For a partitioned table, we call buildScan for every partition. So, looks like we will refresh this table 1824 times.
Attachments
Issue Links
- links to