Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7590

Stress test hit inconsistent results with TPCDS-Q18A

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Cannot Reproduce
    • Impala 3.0, Impala 2.12.0
    • None
    • Backend
    • None

    Description

      Recent runs of stress test in a cluster with 135 nodes resulted in inconsistent result every now and then for TPCDS-Q18a. The scale of TPC-DS is 10000.

      --- result_correct.txt	2018-09-10 08:54:30.427603941 -0700
      +++ result_incorrect.txt	2018-09-10 17:39:59.512926323 -0700
      @@ -1,3 +1,4 @@
      +opening /tmp/stress/instance1/data/jenkins/workspace/impala-test-stress-secure-140node/archive/result_hashes/input.txt
       +------------------+------------+----------+-----------+-------+--------+----------+--------+----------+---------+------+
       | i_item_id        | ca_country | ca_state | ca_county | agg1  | agg2   | agg3     | agg4   | agg5     | agg6    | agg7 |
       +------------------+------------+----------+-----------+-------+--------+----------+--------+----------+---------+------+
      @@ -13,7 +14,7 @@
       | AAAAAAAAAABMAAAA |            | IN       |           | 67.00 | 105.60 | 2232.51  | 74.08  | -1114.55 | 1964.50 | 1.00 |
       | AAAAAAAAAABNFAAA |            | IN       |           | 40.00 | 115.76 | 0.00     | 70.61  | -459.60  | 1933.00 | 3.00 |
       | AAAAAAAAAACBBAAA |            | IN       |           | 32.00 | 37.99  | 0.00     | 8.73   | -448.64  | 1963.00 | 3.00 |
      -| AAAAAAAAAACCAAAA |            | IN       |           | 56.00 | 2.50   | 0.00     | 0.62   | -62.72   | NULL    | 4.00 |
      +| AAAAAAAAAACCAAAA |            | IN       |           | 56.00 | 2.50   | 0.00     | 0.62   | -62.72   | 38463209| 4.00 |
       | AAAAAAAAAACDCAAA |            | IN       |           | 30.00 | 53.19  | 0.00     | 17.02  | -505.80  | 1990.00 | 6.00 |
       | AAAAAAAAAACFDAAA |            | IN       |           | 58.00 | 113.96 | 0.00     | 19.37  | -2148.90 | 1974.00 | 1.00 |
       | AAAAAAAAAACHEAAA |            | IN       |           | 16.00 | 19.90  | 0.00     | 13.13  | 9.76     | 1960.00 | 3.00 |
      @@ -101,4 +102,4 @@
       | AAAAAAAAAAPKBAAA |            | IN       |           | 2.00  | 65.90  | 0.00     | 58.65  | 60.24    | 1954.00 | 3.00 |
       | AAAAAAAAAAPOAAAA |            | IN       |           | 92.00 | 125.36 | 0.00     | 94.02  | 1743.40  | 1963.00 | 6.00 |
       | AAAAAAAAAAPODAAA |            | IN       |           | 75.00 | 119.08 | 0.00     | 104.79 | 4501.50  | 1981.00 | 5.00 |
      -+------------------+------------+----------+-----------+-------+--------+----------+--------+----------+---------+------+
      \ No newline at end of file
      ++------------------+------------+----------+-----------+-------+--------+----------+--------+----------+---------+------+
      

      The problem is not reproducible by running the query at Impala shell.
      The query is TPCDS Q18a:

      with results as
       (select i_item_id,
              ca_country,
              ca_state,
              ca_county,
              cast(cs_quantity as decimal(12,2)) agg1,
              cast(cs_list_price as decimal(12,2)) agg2,
              cast(cs_coupon_amt as decimal(12,2)) agg3,
              cast(cs_sales_price as decimal(12,2)) agg4,
              cast(cs_net_profit as decimal(12,2)) agg5,
              cast(c_birth_year as decimal(12,2)) agg6,
              cast(cd1.cd_dep_count as decimal(12,2)) agg7
       from catalog_sales, customer_demographics cd1, customer_demographics cd2, customer, customer_address, date_dim, item
       where cs_sold_date_sk = d_date_sk and
             cs_item_sk = i_item_sk and
             cs_bill_cdemo_sk = cd1.cd_demo_sk and
             cs_bill_customer_sk = c_customer_sk and
             cd1.cd_gender = 'F' and
             cd1.cd_education_status = 'Unknown' and
             c_current_cdemo_sk = cd2.cd_demo_sk and
             c_current_addr_sk = ca_address_sk and
             c_birth_month in (1, 6, 8, 9, 12, 2) and
             d_year = 1998 and
             ca_state in ('MS', 'IN', 'ND', 'OK', 'NM', 'VA', 'MS')
       )
        select  i_item_id, ca_country, ca_state, ca_county, agg1, agg2, agg3, agg4, agg5, agg6, agg7
       from (
        select i_item_id, ca_country, ca_state, ca_county, avg(agg1) agg1,
          avg(agg2) agg2, avg(agg3) agg3, avg(agg4) agg4, avg(agg5) agg5, avg(agg6) agg6, avg(agg7) agg7
        from results
        group by i_item_id, ca_country, ca_state, ca_county
        union all
        select i_item_id, ca_country, ca_state, NULL as county, avg(agg1) agg1, avg(agg2) agg2, avg(agg3) agg3,
          avg(agg4) agg4, avg(agg5) agg5, avg(agg6) agg6, avg(agg7) agg7
        from results
        group by i_item_id, ca_country, ca_state
        union all
        select i_item_id, ca_country, NULL as ca_state, NULL as county, avg(agg1) agg1, avg(agg2) agg2, avg(agg3) agg3,
          avg(agg4) agg4, avg(agg5) agg5, avg(agg6) agg6, avg(agg7) agg7
        from results
        group by i_item_id, ca_country
        union all
        select i_item_id, NULL as ca_country, NULL as ca_state, NULL as county, avg(agg1) agg1, avg(agg2) agg2, avg(agg3) agg3,
          avg(agg4) agg4, avg(agg5) agg5, avg(agg6) agg6, avg(agg7) agg7
        from results
        group by i_item_id
        union all
        select NULL AS i_item_id, NULL as ca_country, NULL as ca_state, NULL as county, avg(agg1) agg1, avg(agg2) agg2, avg(agg3) agg3,
          avg(agg4) agg4, avg(agg5) agg5, avg(agg6) agg6, avg(agg7) agg7
        from results
       ) foo
       order by ca_country, ca_state, ca_county, i_item_id
       limit 100;
      

      cc'ing tarmstrong@cloudera.com, twm378

      Attachments

        Activity

          People

            twmarshall Thomas Tauber-Marshall
            kwho Michael Ho
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: