Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.2
-
None
-
None
Description
When we use more than one field from structure after explode, all fields will be read.
Example:
1) Loading data
val jsonStr = """{ "items": [ {"itemId": 1, "itemData1": "a", "itemData2": 11}, {"itemId": 2, "itemData1": "b", "itemData2": 22} ] }""" val df = spark.read.json(Seq(jsonStr).toDS) df.write.format("parquet").mode("overwrite").saveAsTable("persisted")
2) read query with explain
val read = spark.table("persisted") spark.conf.set("spark.sql.optimizer.nestedSchemaPruning.enabled", true) read .select(explode('items).as('item)) .select($"item.itemId", $"item.itemData1") .explain // ReadSchema: struct<items:array<struct<itemData1:string,itemData2:bigint,itemId:bigint>>>
We use only itemId and itemData1 fields from structure in array, but read schema contains itemData2 field as well.