Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1259

C++ Client scanner API uses lots of RAM for empty projections

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Public beta
    • Fix Version/s: 0.7.0
    • Component/s: client, impala
    • Labels:
      None

      Description

      Currently, the server side will base the number of rows per batch to achieve a few MB per response, or a time budget, whichever comes first. But, when scanning an empty projection (for a COUNT query in Impala for example), the size budget is never achieved, so we can end up with response batches with upwards of 80M rows. This is fine on the RPC layer, but when the client tries to expand these into a vector<KuduRowResult>, we end up taking sizeof(KuduRowResult)*80M = 1.3GB of RAM.

      When running COUNT queries on Impala against tables with lots of tablets, this quickly pushes the impalad up to 60GB+ RAM usage given it starts a number of scanners in parallel.

        Attachments

          Activity

            People

            • Assignee:
              tlipcon Todd Lipcon
              Reporter:
              tlipcon Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: