[TEZ-3912] Fetchers should be more robust to corrupted inputs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9.2, 0.10.0
Component/s: None
Labels:
None

Description

I recently saw a case where a bad node in the cluster produced corrupted shuffle data that caused the codec to throw IllegalArgumentException when trying to fetch. Fetchers currently only handle IOException and InternalError, and any other type of exception will cause the entire task to be torn down. We should consider catching Exception like MapReduce does to be more robust in light of other types of errors coming from the codec and allow retries to occur.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TEZ-3912.001.patch
27/Jun/18 18:55
3 kB
Kuhu Shukla
TEZ-3912.002.patch
06/Jul/18 15:43
5 kB
Kuhu Shukla

Issue Links

relates to

TEZ-3196 java.lang.InternalError from decompression codec is fatal to a task during shuffle

Closed

Activity

People

Assignee:: Kuhu Shukla

Reporter:: Jason Darrell Lowe

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Apr/18 14:18

Updated:: 19/Apr/19 19:21

Resolved:: 09/Jul/18 15:51