Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3487

stress test didn't fail on hash mismatch errors

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Impala 2.6.0
    • Fix Version/s: Impala 2.10.0
    • Component/s: Infrastructure
    • Labels:
      None

      Description

      http://sandbox.jenkins.cloudera.com/job/Impala-Stress-Test-Physical/484/

      That's a green build, but there are messages like these in the console log:

      01:09:27 Process Process-80:
      01:09:27 Traceback (most recent call last):
      01:09:27   File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
      01:09:27     self.run()
      01:09:27   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
      01:09:27     self._target(*self._args, **self._kwargs)
      01:09:27   File "tests/stress/concurrent_select.py", line 625, in _start_single_runner
      01:09:27     % (query.result_hash, report.result_hash, query.sql))
      01:09:27 Exception: Result hash mismatch; expected 3440873535184108466, got 3440873535183908922
      01:09:27 Query: select
      01:09:27   p_brand,
      01:09:27   p_type,
      01:09:27   p_size,
      01:09:27   count(distinct ps_suppkey) as supplier_cnt
      01:09:27 from
      01:09:27   partsupp,
      01:09:27   part
      01:09:27 where
      01:09:27   p_partkey = ps_partkey
      01:09:27   and p_brand <> 'Brand#45'
      01:09:27   and p_type not like 'MEDIUM POLISHED%'
      01:09:27   and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
      01:09:27   and ps_suppkey not in (
      01:09:27     select
      01:09:27       s_suppkey
      01:09:27     from
      01:09:27       supplier
      01:09:27     where
      01:09:27       s_comment like '%Customer%Complaints%'
      01:09:27   )
      01:09:27 group by
      01:09:27   p_brand,
      01:09:27   p_type,
      01:09:27   p_size
      01:09:27 order by
      01:09:27   supplier_cnt desc,
      01:09:27   p_brand,
      01:09:27   p_type,
      01:09:27   p_size
      01:09:27 01:09:27 24510 139687650531072 INFO:concurrent_select[374]:Checking for crashes
      01:09:27 01:09:27 24510 139687650531072 INFO:concurrent_select[377]:No crashes detected
      01:09:28  1284 |     100 |        292 |        0 |    136 |   0 |              623 |          212499 |       73720 |   61890
      01:09:33  1288 |     104 |        292 |        0 |    136 |   0 |              470 |          212463 |       73790 |   62473
      01:09:39  1293 |     106 |        292 |        0 |    138 |   0 |              613 |          212331 |       74650 |   63492
      01:09:44  1296 |     108 |        292 |        0 |    139 |   0 |             6656 |          211910 |       74870 |   64185
      

      In this particular run there were 10 such.

      A quick glance suggests we need to have 10 so-called "successive" errors to fail a build if this happens.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mikesbrown Michael Brown
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: