Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27589

Spark file source V2

    XMLWordPrintableJSON

    Details

    • Type: Umbrella
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      Re-implement file sources with data source V2 API

        Attachments

        1.
        Implement `doCanonicalize` in BatchScanExec for comparing query plan results Sub-task Resolved Gengliang Wang
        2.
        File source V2: return correct result for Dataset.inputFiles() Sub-task Resolved Gengliang Wang
        3.
        File source V2: support refreshing metadata cache Sub-task Resolved Gengliang Wang
        4.
        Revise the exception message of schema inference failure in file source V2 Sub-task Resolved Gengliang Wang
        5.
        File source V2 table provider should be compatible with V1 provider Sub-task Resolved Gengliang Wang
        6.
        Support UDF input_file_name in file source V2 Sub-task Resolved Gengliang Wang
        7.
        Support schema pruning in Orc V2 Sub-task Resolved Gengliang Wang
        8.
        Migrate Parquet to File Data Source V2 Sub-task Resolved Gengliang Wang
        9.
        File source V2: Invalidate cache data on overwrite/append Sub-task Resolved Gengliang Wang
        10.
        File source V2: Prune unnecessary partition columns Sub-task Resolved Gengliang Wang
        11.
        File source V2: return actual schema in method `FileScan.readSchema` Sub-task Resolved Gengliang Wang
        12.
        Fall back all v2 file sources in `InsertIntoTable` to V1 FileFormat Sub-task Resolved Gengliang Wang
        13.
        File source V2: Ignore empty files in load Sub-task Resolved Gengliang Wang
        14.
        Handles exceptions on proceeding to next record in FilePartitionReader Sub-task Resolved Gengliang Wang
        15.
        Migrate Text to File Data Source V2 Sub-task Resolved Unassigned
        16.
        File source v2 should validate data schema only Sub-task Resolved Gengliang Wang
        17.
        Improve file source V2 framework Sub-task Resolved Gengliang Wang
        18.
        Migrate JSON to File Data Source V2 Sub-task Resolved Gengliang Wang
        19.
        Migrate CSV to File Data Source V2 Sub-task Resolved Gengliang Wang
        20.
        Remove data source option check_files_exist Sub-task Resolved Gengliang Wang
        21.
        Support handling partition values in the abstraction of file source V2 Sub-task Resolved Gengliang Wang
        22.
        File Source V2: avoid creating unnecessary FileIndex in the write path Sub-task Resolved Gengliang Wang
        23.
        Support schema validation in File Source V2 Sub-task Resolved Gengliang Wang
        24.
        File source V2 write: create framework and migrate ORC to it Sub-task Resolved Gengliang Wang
        25.
        Allow OrcColumnarBatchReader to return less partition columns Sub-task Resolved Gengliang Wang
        26.
        Create file source V2 framework and migrate ORC read path Sub-task Resolved Gengliang Wang
        27.
        File source V2: support reporting statistics Sub-task Resolved Gengliang Wang
        28.
        Redact treeString of FileTable and DataSourceV2ScanExecBase Sub-task Resolved Gengliang Wang
        29.
        Allow altering table add columns with CSVFileFormat/JsonFileFormat provider Sub-task Resolved Gengliang Wang
        30.
        File source v2: support reading output of file streaming Sink Sub-task Resolved Gengliang Wang
        31.
        useV1SourceList configuration should be for all data sources Sub-task Resolved Gengliang Wang
        32.
        Migrate Avro to File source V2 Sub-task Resolved Gengliang Wang
        33.
        Add PathCatalog for data source V2 Sub-task Resolved Unassigned
        34.
        File source V2: support partition pruning Sub-task Resolved Gengliang Wang
        35.
        Disable all the V2 file sources in Spark 3.0 by default Sub-task Resolved Gengliang Wang
        36.
        File source V2: Support partition pruning with subqueries Sub-task Open Unassigned
        37.
        Add V1/V2 tests for TextSuite and WholeTextFileSuite Sub-task Resolved Gengliang Wang

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Gengliang.Wang Gengliang Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: