Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2174

Improve the calculation of the TupleDescriptor::avgSerializedSize_

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.4.1, Impala 2.2, Impala 2.3.0
    • Impala 2.5.0
    • Frontend

    Description

      Whenever we serialize a row batch, even a row batch with 0 materialized slots, we always allocate an array of tuple_offsets per tuple. That means that there is a serialization overhead of 4B per tuple (per row).

      Currently we do not consider this overhead when we calculate the TupleDescriptor::avgSerializedSize_ and consequently the avgRowSize_ which is used for example when we decide which input to broadcast/distribute.

      We should take into account this overhead. Such a change may affect plans of queries with small avgRowSize_ or multiple tuples (joins).

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            ippokratis Ippokratis Pandis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: