Description
The MAP phase for Inserts into a bucketed table randomly fails with the error "Vertex <vertex_id> [Map 1] failed as task <task_id> failed after vertex succeeded.]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0".
The task fails because it fails for all attempts with "<attempt_id> being failed for too many output errors. failureFraction=0.2, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0"
This happens more often if the table is ACID enabled and a delete operation is performed before the inserts.
I have tried the following:
Changed tez.am.launch.cmd-opts, tez.task.launch.cmd-opts and hive.tez.java.opts to use parallel GC.
tez.runtime.shuffle.max.allowed.failed.fetch.fraction = 0.95
tez.runtime.shuffle.failed.check.since-last.completion=false
tez.runtime.shuffle.fetch.buffer.percent = 0.1
tez.runtime.shuffle.memory.limit.percent = 0.25
tez.runtime.shuffle.ssl.enable=false
Deleted ".../usercache/<user>/filecache" and ".../usercache/<user>/appcache"
I am using HDP 2.6 dsitribution.