Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3311

String data coming out of agg can be corrupted by blocking operators

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0
    • Fix Version/s: Impala 2.6.0
    • Component/s: Backend
    • Labels:

      Description

      Varlen data (e.g. strings) produced by aggregations is freed after passing up the output (https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/partitioned-aggregation-node.cc#L1353). This works fine for streaming operators or blocking operators that copy their input, but results in memory corruption when the output reaches non-copying blocking operators.

      Repro
      Build ASAN, start an impalad with the -disable_mem_pools flag, and run the following query:

      select id, m from functional_parquet.complextypestbl t,
      (select min(cast(item as string)) m from t.int_array) v
      

      I've attached the ASAN output from running this query (asan_output.txt).

      Symptoms
      If the query plan contains an aggregation node producing string values anywhere within a subplan (i.e. if in the SQL statement, the aggregate function appears within an inline view over a collection column), the results of the aggregation may be incorrect.

        Attachments

        1. asan_output.txt
          17 kB
          Skye Wanderman-Milne

          Activity

            People

            • Assignee:
              skye Skye Wanderman-Milne
              Reporter:
              skye Skye Wanderman-Milne
            • Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: