[SPARK-38639] Support ignoreCorruptRecord flag to ensure querying broken sequence file table smoothly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: In Progress
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.1.2, 3.2.1
Fix Version/s: 3.2.1
Component/s: SQL
Labels:
None

Description

There's an existing flag "spark.sql.files.ignoreCorruptFiles" and "spark.sql.files.ignoreMissingFiles" that will quietly ignore attempted reads from files that have been corrupted, but it still allows the query to fail on sequence files.

Being able to ignore corrupt record is useful in the scenarios that users want to query successfully in dirty data(mixed schema in one table).

We would like to add a "spark.sql.hive.ignoreCorruptRecord" to fill out the functionality.

Attachments

Issue Links

links to

[Github] Pull Request #35954 (TonyDoen)

[Github] Pull Request #35962 (TonyDoen)

[Github] Pull Request #35963 (TonyDoen)

[Github] Pull Request #35990 (TonyDoen)

[Github] Pull Request #37341 (caican00)

(4 links to)

Activity

People

Assignee:: Unassigned

Reporter:: tonydoen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Due:: 23/Mar/22

Created:: 23/Mar/22 19:28

Updated:: 29/Jul/22 08:28

Time Tracking

Estimated:

48h

Remaining:

48h

Logged:

Not Specified