Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1580

Optimize conversion of row batch to query result set

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • Impala 2.0.1
    • None
    • Perf Investigation

    Description

      For simple queries that produce a large result set such as "select * from tpch.lineitem" the server execution time is limited by the time required to convert row batches (results in the internal structure) to query results (the structure to be sent to the client). The data conversion is the limiting factor in this case because the query plan execution happens in parallel.

      Here are some data points from the profile of "select * from tpch.lineitem" using HS2 (this was taken using --exchg_node_buffer_size_bytes=2048576000 so the exchange node would never block because of a full buffer.). Beeswax takes even longer to convert the rows.

      • Query Timeline: 1m9s
      • Execution Profile – Total: 1s295ms
      • ClientFetchWaitTimer: 52s553ms
      • RowMaterializationTimer: 15s216ms
      • Coordinator Fragment F01:(Total: 1s092ms
      • Averaged Fragment F00:(Total: 5s608ms

      So the "RowMaterializationTimer", which is actually conversion time, adds ~9 seconds or ~2x the plan execution time to the overall time.

      Ideally the conversion time would be codegen'd but even without that there should be a lot of room for improvement by reducing function calls.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              caseyc casey
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: