Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.3.0, 2.4.0
-
None
-
None
-
None
Description
Currently, the ShuffleHeader (which is a Writable) simply tries to read the successful header (mapid, reduceid etc). If there is an error then the input will have an error message instead of (mapid, reducedid etc). Thus parsing the ShuffleHeader fails and since we dont know where the error message ends, we cannot consume the remaining input stream which may have good data from the remaining map outputs. Being able to encode the error in the ShuffleHeader will let us parse out the error correctly and move on to the remaining data.
The shuffle handler response should say which maps are in error and which are fine, what the error was for the erroneous maps. These will help report diagnostics for easier upstream reporting.
Attachments
Issue Links
- relates to
-
TEZ-1223 Shuffle errors at 10 TB scale
- Open