Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6395

Allow the accumulated row batch size of a data sink to be tunable

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.12.0
    • Impala 3.0
    • Distributed Exec
    • None
    • ghx-label-3

    Description

      During scale testing, it was noticed that tuning the size of the accumulated row batches in data stream sender will affect the performance of Impala. This is understandable as a larger row batch will amortize the cost of compression and RPC in general. The default value is 16KB per channel. Experiment in a 38 node cluster with 48 concurrent users running 10TB TPC-DS shows about 5% improvement in query-per-hour when bumping the default value to 512KB. This is a tradeoff between memory consumption and performance. Having this flag allows us to tune for performance more easily.

            if (FLAGS_use_krpc) {
              *sink = pool->Add(new KrpcDataStreamSender(fragment_instance_ctx.sender_id,
                  row_desc, thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024,
                  state));
            } else {
              // TODO: figure out good buffer size based on size of output row
              *sink = pool->Add(new DataStreamSender(fragment_instance_ctx.sender_id, row_desc,
                  thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024, state));
            }
      

      Attachments

        Activity

          People

            kwho Michael Ho
            kwho Michael Ho
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: