Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.13.0
-
None
Description
Queries frequently fail sporadically on some clusters due to the following error
oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION ERROR: Exceeded timeout (25000) while waiting send intermediate work fragments to remote nodes. Sent 5 and only heard response back from 4 nodes.
This error happens because the FragmentsRunner has a hardcoded timeout RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the timeout to 10 seconds resolved the sporadic failures that were observed. This timeout should be changed to 10 and should also be configurable via the SystemOptionManager
Attachments
Issue Links
- links to