Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24691

Add new API `supportDataType` in FileFormat

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.1
    • 2.4.0
    • SQL
    • None

    Description

      In https://github.com/apache/spark/pull/21389,  data source schema is validated. However,

      1. Putting all the process logic together in `DataSourceUtils` is tricky and hard to maintain. On second thought after review, I find that the `OrcFileFormat` in hive package is not matched, so that its validation wrong.
      2. `DataSourceUtils.verifyWriteSchema` and `DataSourceUtils.verifyReadSchema` is not supposed to be called in every file format. We can move them to some upper entry.

      So, I propose we can add a new API `supportDataType` in FileFormat. Each file format can override the method to specify its supported/non-supported data types.

       

      Attachments

        Activity

          People

            Gengliang.Wang Gengliang Wang
            Gengliang.Wang Gengliang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: