Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5388

wrong results under stress with secure cluster

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Distributed Exec

      Description

      The stress test under a secure (Kerberos + SSL) cluster finds that some queries are producing wrong results. I haven't yet been able to pin down why, but I'm going ahead and filing this bug to include what I have. Note that during the run, the queries do not always produce wrong results; only sometimes.

      Queries the stress test has reported as producing wrong results:

      tpch-q3, tpcds-q34, tpch-q12, tpch-q7

      In the case of tpch-q3, I managed to get a complete profile of a correct and incorrect run of the query. See attached.

      TPCH-Q3 is

      select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
      from
        customer,
        orders,
        lineitem
      where
        c_mktsegment = 'BUILDING'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < '1995-03-15'
        and l_shipdate > '1995-03-15'
      group by
        l_orderkey,
        o_orderdate,
        o_shippriority
      order by
        revenue desc,
        o_orderdate
      limit 10
      

      I got as far as noticing that in the "wrong results" case, fewer rows are scanned:

      Results correct

      Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
      ----------------------------------------------------------------------------------------------------------------------------------
      13:MERGING-EXCHANGE        1  308.431us  308.431us       10          10          0              0  UNPARTITIONED
      06:TOP-N                   8   15.787ms   18.612ms       80          10   12.00 KB       580.00 B
      12:AGGREGATE               8  506.351ms  692.593ms    1.13M       1.73M   14.24 MB      105.17 MB  FINALIZE
      11:EXCHANGE                8   75.520ms  138.635ms    1.13M       1.73M          0              0  HASH(l_orderkey,o_orderdate...
      05:AGGREGATE               8  389.129ms  650.835ms    1.13M       1.73M   13.70 MB      105.17 MB  STREAMING
      04:HASH JOIN               8    1s901ms    2s576ms    2.99M       1.73M  153.17 MB       12.98 MB  INNER JOIN, PARTITIONED
      |--10:EXCHANGE             8  235.256ms  552.595ms    3.00M       3.00M          0              0  HASH(c_custkey)
      |  00:SCAN HDFS            5  323.828ms  621.551ms    3.00M       3.00M   29.30 MB      176.00 MB  tpch_100_parquet.customer
      09:EXCHANGE                8  728.297ms  758.348ms   14.57M       6.00M          0              0  HASH(o_custkey)
      03:HASH JOIN               8   24s679ms   29s349ms   14.57M       6.00M  777.11 MB       98.35 MB  INNER JOIN, PARTITIONED
      |--08:EXCHANGE             8    5s521ms    8s310ms   70.97M      15.00M          0              0  HASH(o_orderkey)
      |  01:SCAN HDFS            8    3s626ms    7s399ms   70.97M      15.00M   80.88 MB      352.00 MB  tpch_100_parquet.orders
      07:EXCHANGE                8   14s268ms   15s285ms  323.49M      60.00M          0              0  HASH(l_orderkey)
      02:SCAN HDFS               8   11s632ms   17s863ms  323.49M      60.00M   78.65 MB      352.00 MB  tpch_100_parquet.lineitem
      

      Results incorrect:

          ExecSummary:
      Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
      ----------------------------------------------------------------------------------------------------------------------------------
      13:MERGING-EXCHANGE        1  304.504us  304.504us       10          10          0              0  UNPARTITIONED
      06:TOP-N                   8   19.261ms   29.196ms       80          10   12.00 KB       580.00 B
      12:AGGREGATE               8  305.220ms  449.997ms    1.13M       1.73M   14.24 MB      105.17 MB  FINALIZE
      11:EXCHANGE                8   66.207ms   96.284ms    1.13M       1.73M          0              0  HASH(l_orderkey,o_orderdate...
      05:AGGREGATE               8  516.324ms  653.086ms    1.13M       1.73M   13.58 MB      105.17 MB  STREAMING
      04:HASH JOIN               8    1s217ms    1s461ms    2.99M       1.73M  153.17 MB       12.98 MB  INNER JOIN, PARTITIONED
      |--10:EXCHANGE             8  150.899ms  213.929ms    3.00M       3.00M          0              0  HASH(c_custkey)
      |  00:SCAN HDFS            5  937.452ms    1s753ms    3.00M       3.00M   29.09 MB      176.00 MB  tpch_100_parquet.customer
      09:EXCHANGE                8  563.317ms  581.895ms   11.04M       6.00M          0              0  HASH(o_custkey)
      03:HASH JOIN               8   24s420ms   28s126ms   11.04M       6.00M  649.11 MB       98.35 MB  INNER JOIN, PARTITIONED
      |--08:EXCHANGE             8    2s733ms    2s967ms   53.80M      15.00M          0              0  HASH(o_orderkey)
      |  01:SCAN HDFS            8   30s937ms   47s728ms   53.80M      15.00M   85.11 MB      352.00 MB  tpch_100_parquet.orders
      07:EXCHANGE                8   13s816ms   14s173ms  323.49M      60.00M          0              0  HASH(l_orderkey)
      02:SCAN HDFS               8   10s053ms   12s288ms  323.48M      60.00M   78.57 MB      352.00 MB  tpch_100_parquet.lineitem
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kwho Michael Ho
                Reporter:
                mikesbrown Michael Brown
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: