Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-48649

Add "ignoreInvalidPartitionPaths" and "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring invalid partition paths

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      When having a table directory with invalid partitions such as:

      table/
        invalid/...
        part=1/...
        part=2/...
        part=3/...

      a SQL query reading all of the partitions would fail with 

      java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths: 
       table 
       table/invalid 

       

      I propose to add a data source option and Spark SQL config to ignore invalid partition paths. The config will be disabled by default to retain the current behaviour.

      spark.conf.set("spark.sql.files.ignoreInvalidPartitionPaths", "true")
      spark.read.format("parquet").option("ignoreInvalidPartitionPaths", "true").load(...)  

      Attachments

        Issue Links

          Activity

            People

              ivan.sadikov Ivan Sadikov
              ivan.sadikov Ivan Sadikov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: