Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40918

Mismatch between ParquetFileFormat and FileSourceScanExec in # columns for WSCG.isTooManyFields when using _metadata

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.2
    • SQL
    • None

    Description

      _metadata.columns are taken into account in FileSourceScanExec.supportColumnar, but not when the parquet reader is created. This can result in Parquet reader outputting columnar (because it has less columns than WSCG.isTooManyFields), whereas FileSourceScanExec wants row output (because with the extra metadata columns it hits the isTooManyFields limit).

      I have a fix forthcoming.

      Attachments

        Activity

          People

            juliuszsompolski Juliusz Sompolski
            juliuszsompolski Juliusz Sompolski
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: