[HCATALOG-487] HCatalog should tolerate a user-defined amount of bad records - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.5
Component/s: None
Labels:
None

Description

HCatalog tasks currently fail when deserializing corrupt records. In some cases, large data sets have a small number of corrupt records and its okay to skip them. In fact Hadoop has support for skipping bad records for exactly this reason.

However, using the Hadoop-native record skipping feature (like Hive does) is very coarse and leads to a large number of failed tasks, task scheduling overhead, and limited control over the skipping behavior.

HCatalog should have native support for skipping a user-defined amount of bad records.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HCATALOG-487_skip_bad_records.1.patch
30/Aug/12 22:57
14 kB
Travis Crawford
HCATALOG-487_skip_bad_records.2.patch
31/Aug/12 15:20
15 kB
Travis Crawford

Activity

People

Assignee:: Travis Crawford

Reporter:: Travis Crawford

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 30/Aug/12 22:42

Updated:: 15/Feb/13 21:32

Resolved:: 31/Aug/12 15:20