Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.2
-
None
Description
When a remote fragment fails, in the coordinator logs only the Cancel() message can be seen, not the real error message:
I0906 07:36:36.415954 13332 coordinator.cc:386] starting 2 backends for query 634caf4174771cbc:fe4a29 d380c2ddbe I0906 07:36:36.775049 23369 plan-fragment-executor.cc:300] Open(): instance_id=634caf4174771cbc:fe4a29d380c2ddbf I0906 07:36:37.723321 20510 coordinator.cc:1316] Backend 1 completed, 1 remaining: query_id=634caf4174771cbc:fe4a29d380c2ddbe I0906 07:36:37.723358 20510 coordinator.cc:1325] query_id=634caf4174771cbc:fe4a29d380c2ddbe: first in-progress backend: remote-impalad-host:22000 I0906 07:36:37.734156 22711 coordinator.cc:1131] Cancel() query_id=634caf4174771cbc:fe4a29d380c2ddbe
Of course the real error message is logged on the remote (backend) host, for example:
I0906 07:36:37.732405 15306 runtime-state.cc:230] Error from query 634caf4174771cbc:fe4a29d380c2ddbe: Create file /impala/impalad/impala-scratch/634caf4174771cbc:fe4a29d380c2ddbe_663a9199-63b3-4d5e-be84-297e05a99970 failed with errno=2 description=Error(2): No such file or directory
The error message is recorded in the query profile, but it is hard to track - trace - monitor why the query failed actually.