[SPARK-38314] Fail to read parquet files after writing the hidden file metadata in - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.1
Fix Version/s: 3.3.0
Component/s: SQL
Labels:
None

Description

Selecting and then writing df containing hidden file metadata column `_metadata` into a file format like `parquet`, `delta` will still keep the internal `Attribute` metadata information. Then when reading those `parquet`, `delta` files again, it will actually break the code, because it wrongly thinks user data schema named `_metadata` is a hidden file source metadata column.

Reproducible code:

// prepare a file source df
df.select("*", "_metadata")
  .write.format("parquet").save(path)
spark.read.format("parquet").load(path)
  .select("*").show()

Attachments

Issue Links

links to

[Github] Pull Request #35650 (Yaohua628)

Activity

People

Assignee:: Yaohua Zhao

Reporter:: Yaohua Zhao

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Feb/22 10:25

Updated:: 28/Feb/22 11:47

Resolved:: 28/Feb/22 11:47