Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6143

Make Fragment Runner's RPC Timeout a SystemOption

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.13.0
    • 1.13.0
    • None

    Description

      Queries frequently fail sporadically on some clusters due to the following error

      oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION ERROR: Exceeded timeout (25000) while waiting send intermediate work fragments to remote nodes. Sent 5 and only heard response back from 4 nodes.
      

      This error happens because the FragmentsRunner has a hardcoded timeout RPC_WAIT_IN_MSECS_PER_FRAGMENT which is set at 5 seconds. Increasing the timeout to 10 seconds resolved the sporadic failures that were observed. This timeout should be changed to 10 and should also be configurable via the SystemOptionManager

      Attachments

        Issue Links

          Activity

            People

              timothyfarkas Timothy Farkas
              timothyfarkas Timothy Farkas
              Boaz Ben-Zvi Boaz Ben-Zvi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: