Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1259

C++ Client scanner API uses lots of RAM for empty projections

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Public beta
    • 0.7.0
    • client, impala
    • None

    Description

      Currently, the server side will base the number of rows per batch to achieve a few MB per response, or a time budget, whichever comes first. But, when scanning an empty projection (for a COUNT query in Impala for example), the size budget is never achieved, so we can end up with response batches with upwards of 80M rows. This is fine on the RPC layer, but when the client tries to expand these into a vector<KuduRowResult>, we end up taking sizeof(KuduRowResult)*80M = 1.3GB of RAM.

      When running COUNT queries on Impala against tables with lots of tablets, this quickly pushes the impalad up to 60GB+ RAM usage given it starts a number of scanners in parallel.

      Attachments

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: