Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6781

Stress tests should take into account of non-deterministic results of queries

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 3.1.0
    • Impala 3.1.0
    • Infrastructure
    • None
    • ghx-label-7

    Description

      Certain queries have the pattern of ordery-by with a limit clause. Due to the limit clause, we may not include all the rows for a given order-by keys in the output. It's non-deterministic that which rows will be included in the output. Given the stress test will store the hashes of the query result obtained during binary search, we may sometimes get false alarm on result mis-match. One way to fix it is that we need to order all the columns so that the results will always be deterministic.

      For instance, the following TPC-H Q3 query may have multiple correct results:

          Default Db: tpch_10000_parquet
          Sql Statement: /* Mem: 8152 MB. Coordinator: vc1503.halxg.cloudera.com. */
      select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
      from
        customer,
        orders,
        lineitem
      where
        c_mktsegment = 'BUILDING'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < '1995-03-15'
        and l_shipdate > '1995-03-15'
      group by
        l_orderkey,
        o_orderdate,
        o_shippriority
      order by
        revenue desc,
        o_orderdate
      limit 10
      

      Correct result 1:

      +-------------+-------------+-------------+----------------+
      | l_orderkey  | revenue     | o_orderdate | o_shippriority |
      +-------------+-------------+-------------+----------------+
      | 18134335556 | 510414.0360 | 1995-02-13  | 0              |
      | 52494073892 | 510414.0360 | 1995-02-13  | 0              |
      | 26724270146 | 510414.0360 | 1995-02-13  | 0              |
      | 954466400   | 510414.0360 | 1995-02-13  | 0              |
      | 9544400966  | 510414.0360 | 1995-02-13  | 0              |
      | 35314204736 | 510414.0360 | 1995-02-13  | 0              |
      | 43904139302 | 510414.0360 | 1995-02-13  | 0              |
      | 7133834945  | 503732.3318 | 1995-02-25  | 0              |
      | 24313704101 | 503732.3318 | 1995-02-25  | 0              |
      | 58673442437 | 503732.3318 | 1995-02-25  | 0              |
      +-------------+-------------+-------------+----------------+
      

      Correct result 2:

      +-------------+-------------+-------------+----------------+
      | l_orderkey  | revenue     | o_orderdate | o_shippriority |
      +-------------+-------------+-------------+----------------+
      | 26724270146 | 510414.0360 | 1995-02-13  | 0              |
      | 43904139302 | 510414.0360 | 1995-02-13  | 0              |
      | 35314204736 | 510414.0360 | 1995-02-13  | 0              |
      | 52494073892 | 510414.0360 | 1995-02-13  | 0              |
      | 18134335556 | 510414.0360 | 1995-02-13  | 0              |
      | 9544400966  | 510414.0360 | 1995-02-13  | 0              |
      | 954466400   | 510414.0360 | 1995-02-13  | 0              |
      | 32903638691 | 503732.3318 | 1995-02-25  | 0              |
      | 15723769511 | 503732.3318 | 1995-02-25  | 0              |
      | 50083507847 | 503732.3318 | 1995-02-25  | 0              |
      +-------------+-------------+-------------+----------------+
      

      If one changes the limit clause to 15, we will see that the last 3 rows should be a subset of rows with (503732.3318, 1995-02-25) as the ordering keys revenue and o_orderdate are identical so it's non-deterministic which rows are shown. The result will be deterministic if the order-by clause also include l_orderkey and o_shippriority

      +-------------+-------------+-------------+----------------+
      | l_orderkey  | revenue     | o_orderdate | o_shippriority |
      +-------------+-------------+-------------+----------------+
      | 26724270146 | 510414.0360 | 1995-02-13  | 0              |
      | 43904139302 | 510414.0360 | 1995-02-13  | 0              |
      | 35314204736 | 510414.0360 | 1995-02-13  | 0              |
      | 52494073892 | 510414.0360 | 1995-02-13  | 0              |
      | 18134335556 | 510414.0360 | 1995-02-13  | 0              |
      | 9544400966  | 510414.0360 | 1995-02-13  | 0              |
      | 954466400   | 510414.0360 | 1995-02-13  | 0              |
      | 32903638691 | 503732.3318 | 1995-02-25  | 0              |
      | 15723769511 | 503732.3318 | 1995-02-25  | 0              |
      | 50083507847 | 503732.3318 | 1995-02-25  | 0              |
      | 7133834945  | 503732.3318 | 1995-02-25  | 0              |
      | 41493573281 | 503732.3318 | 1995-02-25  | 0              |
      | 24313704101 | 503732.3318 | 1995-02-25  | 0              |
      | 58673442437 | 503732.3318 | 1995-02-25  | 0              |
      | 54409232481 | 499104.0645 | 1995-03-10  | 0              |
      +-------------+-------------+-------------+----------------+
      

      Attachments

        Activity

          People

            mikeb Michael Brown
            kwho Michael Ho
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: