Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38314

Fail to read parquet files after writing the hidden file metadata in

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.1
    • 3.3.0
    • SQL
    • None

    Description

      Selecting and then writing df containing hidden file metadata column `_metadata` into a file format like `parquet`, `delta` will still keep the internal `Attribute` metadata information. Then when reading those `parquet`, `delta` files again, it will actually break the code, because it wrongly thinks user data schema named `_metadata` is a hidden file source metadata column.

       

      Reproducible code:

      // prepare a file source df
      df.select("*", "_metadata")
        .write.format("parquet").save(path)
      spark.read.format("parquet").load(path)
        .select("*").show()

      Attachments

        Activity

          People

            yaohua Yaohua Zhao
            yaohua Yaohua Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: