Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23166

Guard VGB from flushing too often

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.0.0
    • Fix Version/s: None
    • Component/s: llap
    • Labels:
      None

      Description

      The existing flush logic in our VectorGroupByOperator is completely static.
      It depends on the: number of HtEntries (hive.vectorized.groupby.maxentries) and the MAX memory threshold (by default 90% of available memory)

      Assuming that we are not memory constrained the periodicity of flushing is currently dictated by the static number of entries (1M by default) which can be also misconfigured to a very low value.

      I am proposing along with maxHtEntries, to also take into account current memory usage, to avoid flushing too ofter as it can hurt op throughput for particular workloads.

        Attachments

        1. HIVE-23166.01.patch
          6 kB
          Panagiotis Garefalakis
        2. HIVE-23166.02.patch
          6 kB
          Panagiotis Garefalakis
        3. HIVE-23166.03.patch
          6 kB
          Panagiotis Garefalakis

          Activity

            People

            • Assignee:
              pgaref Panagiotis Garefalakis
              Reporter:
              pgaref Panagiotis Garefalakis
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: