[SPARK-37450] Spark SQL reads unnecessary nested fields (another type of pruning case) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0
Fix Version/s: 3.3.0
Component/s: SQL
Labels:
None

Description

Based on this SPARK-34638 Maybe I found another nested fields pruning case. In this case I found full read with `count` function

Example:
1) Loading data

val jsonStr = """{
 "items": [
   {"itemId": 1, "itemData": "a"},
   {"itemId": 2, "itemData": "b"}
 ]
}"""
val df = spark.read.json(Seq(jsonStr).toDS)
df.write.format("parquet").mode("overwrite").saveAsTable("persisted")

2) read query with explain

val read = spark.table("persisted")
spark.conf.set("spark.sql.optimizer.nestedSchemaPruning.enabled", true)

read.select(explode($"items").as('item)).select(count(lit(true))).explain(true)
// ReadSchema: struct<items:array<struct<itemData:string,itemId:bigint>>>

Attachments

Issue Links

links to

[Github] Pull Request #34701 (viirya)

Activity

People

Assignee:: L. C. Hsieh

Reporter:: Jiri Humpolicek

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/Nov/21 12:22

Updated:: 12/Dec/22 18:11

Resolved:: 02/Dec/21 17:13