Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27442

ParquetFileFormat fails to read column named with invalid characters

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 2.0.0, 2.4.1
    • Fix Version/s: None
    • Component/s: Input/Output
    • Labels:
      None

      Description

      When reading a parquet file which contains characters considered invalid, the reader fails with exception:

      Name: org.apache.spark.sql.AnalysisException
      Message: Attribute name "..." contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.

      Spark should not be able to write such files, but it should be able to read it (and allow the user to correct it). However, possible workarounds (such as using alias to rename the column, or forcing another schema) do not work, since the check is done on the input.

      (Possible fix: remove superficial ParquetWriteSupport.setSchema(requiredSchema, hadoopConf) from buildReaderWithPartitionValues ?)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              vrs Jan Vršovský
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: