Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6395

Allow the accumulated row batch size of a data sink to be tunable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Impala 2.12.0
    • Fix Version/s: Impala 3.0
    • Component/s: Distributed Exec
    • Labels:
      None
    • Epic Color:
      ghx-label-3

      Description

      During scale testing, it was noticed that tuning the size of the accumulated row batches in data stream sender will affect the performance of Impala. This is understandable as a larger row batch will amortize the cost of compression and RPC in general. The default value is 16KB per channel. Experiment in a 38 node cluster with 48 concurrent users running 10TB TPC-DS shows about 5% improvement in query-per-hour when bumping the default value to 512KB. This is a tradeoff between memory consumption and performance. Having this flag allows us to tune for performance more easily.

            if (FLAGS_use_krpc) {
              *sink = pool->Add(new KrpcDataStreamSender(fragment_instance_ctx.sender_id,
                  row_desc, thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024,
                  state));
            } else {
              // TODO: figure out good buffer size based on size of output row
              *sink = pool->Add(new DataStreamSender(fragment_instance_ctx.sender_id, row_desc,
                  thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024, state));
            }
      

        Attachments

          Activity

            People

            • Assignee:
              kwho Michael Ho
              Reporter:
              kwho Michael Ho
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: