Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8656 Support for eagerly fetching and spooling all query result rows
  3. IMPALA-8786

BufferedPlanRootSink should directly write to a QueryResultSet if one is available

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Later
    • None
    • Not Applicable
    • Backend
    • None
    • ghx-label-4

    Description

      BufferedPlanRootSink uses a RowBatchQueue to buffer RowBatch-es and then the consumer thread reads them and writes them to a given QueryResultSet. Implementations of RowBatchQueue might end up copying the buffered RowBatch-es (e.g. if the queue is backed by a BufferedTupleStream). An optimization would be for the producer thread to directly write to the consumer QueryResultSet. This optimization would only be triggered if (1) the queue is empty, and (2) the consumer thread has a QueryResultSet available for writing.

      This "fast path" is useful in a few different scenarios:

      • If the consumer is faster than at reading rows than the producer is at sending them; in this case, the overhead of buffering rows in a RowBatchQueue can be completely avoided
      • For queries that return under 1024 its likely that the consumer will produce a QueryResultSet before the first RowBatch is returned (except perhaps for very trivial queries)

      Attachments

        Activity

          People

            stakiar Sahil Takiar
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: