XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:
      None
    • Epic Color:
      ghx-label-10

      Description

      All of these classes need to be updated to support transparent query retries, and each one could due with some re-factoring so that query retries don't make this code even more complex. For now, I'm going to list out some ideas / suggestions:

      • Rename ImpalaServer to ImpalaService, I think ImpalaServer is a bit of a misnomer because Impala isn't implementing its own server (it uses Thrift for that) instead it is providing a "service" to end users - this name is consistent with Thrift "service"s as well
      • Split up ClientRequestState - I'm not sure I fully understand what ClientRequestState is suppose to encapsulate - perhaps originally it captured the state of the actual client request as well as some helper code, but it seems to have evolved over time; it doesn't really look like a purely "stateful" object any more (e.g. it manages admission control submission)

      One possible end state could be:

      ImpalaService <–> QueryDriver (has a ClientRequestState that is not exposed externally) <–> QueryInstance <–> Coordinator

      The QueryDriver is responsible for E2E execution of a query, including all stages such as parsing / planning of a query, submission to admission control, and backend execution. A QueryInstance is a single instance of a query, this is necessary for query retry support since a single query can be run multiple times. The Coordinator remains mostly the same - it is purely responsible for backend coordination / execution of a query.

      This provides an opportunity to move a lot of the execution specific logic out of ImpalaServer and into QueryDriver. Currently, ImpalaServer is responsible for submitting the query to the fe/ and then passing the result to the ClientRequestState which submits it for admission control (and eventually the Coordinator for execution).

      QueryDriver encapsulates the E2E execution of a query (starting from a query string, and then returning the results of a query) (inspired by Hive's IDriver interface - https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/IDriver.java).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                stakiar Sahil Takiar
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: