Hive does not return meaningful error messages for runtime errors. Also, the same error code is returned for a whole bunch of unrelated errors. A programmatic caller cannot decide if it should retry or give up. This JIRA will get the ball rolling for having Hive return useful error codes and display useful messages when something goes wrong. I propose the following partitioning of error codes:
10000 to 19999: Errors that occur during semantic analysis and compilation of the query. Hive already does a pretty good job for these. Error codes will be attached to the error messages currently being used.
20000 to 29999: Runtime errors where Hive believes that retries will not succeed and the caller should not bother retrying.
30000 to 39999: Runtime errors which Hive thinks are probably transient and retrying may succeed.
40000 to 49999: Runtime errors where Hive is unable to say anything about whether retries will succeed or not. Ideally, we want to avoid using this range as much as possible.
Once we have this in place, over time we can migrate errors occurring in Hive operators to use this scheme. This patch will deal with setting up the error code space, setting up the mechanism for failed MapReduce tasks to relay the error code back to Hive client, and using this new scheme for a couple of common errors.