Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38639

Support ignoreCorruptRecord flag to ensure querying broken sequence file table smoothly

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • 3.1.2, 3.2.1
    • 3.2.1
    • SQL
    • None

    Description

      There's an existing flag "spark.sql.files.ignoreCorruptFiles" and "spark.sql.files.ignoreMissingFiles" that will quietly ignore attempted reads from files that have been corrupted, but it still allows the query to fail on sequence files.

       

      Being able to ignore corrupt record is useful in the scenarios that users want to query successfully in dirty data(mixed schema in one table).

       

      We would like to add a "spark.sql.hive.ignoreCorruptRecord"  to fill out the functionality.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tonydoen tonydoen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified