Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
df = spark.createDataFrame([[1, [[1, 2]]]], schema='x:int,y:struct<a:array<int>>') df.write.mode('overwrite').parquet('test')
# This causes an error "Caused by: java.lang.RuntimeException: Couldn't find x#720 in [y#721]" spark.read.parquet('test').select(F.expr('y.a[x]')).show() # Explain works fine, note it doesn't read x in ReadSchema spark.read.parquet('test').select(F.expr('y.a[x]')).explain() == Physical Plan == *(1) !Project [y#713.a[x#712] AS y.a AS `a`[x]#717] +- FileScan parquet [y#713] Batched: false, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<y:struct<a:array<int>>>
The code works well if I
# manually select the column it misses spark.read.parquet('test').select(F.expr('y.a[x]'), F.col('x')).show() # use element_at function spark.read.parquet('test').select(F.element_at('y.a', F.col('x') + 1)).show()