Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11114

calculate_tval fails with ZeroDevisionError if the standard deviations are 0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • Impala 4.1.0
    • None
    • None
    • ghx-label-3

    Description

      Possible cause:

      Rounding of the data or other forms of truncation could give zero standard deviation when in fact you have some. And if the difference that you are trying to measure is within your measurement error that is a problem not addressed by the t-test.

      https://stats.stackexchange.com/questions/78570/t-test-with-sample-standard-deviation-of-zero-possible/275879

      Full log:

      Traceback (most recent call last):
        File "/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py", line 1131, in <module>
          report = Report(grouped, ref_grouped)
        File "/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py", line 494, in __init__
          self.__analyze()
        File "/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py", line 514, in __analyze
          query_comparison_row = Report.QueryComparisonRow(results, ref_results)
        File "/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py", line 370, in __init__
          self.__check_perf_change_significance(results, ref_results))
        File "/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py", line 390, in __check_perf_change_significance
          ref_stat[AVG], ref_stat[STDDEV], ref_stat[ITERATIONS])
        File "/home/gfurnstahl/Impala/tests/util/calculation_util.py", line 65, in calculate_tval
          return (avg - ref_avg) / sem
      ZeroDivisionError: float division by zero
      Traceback (most recent call last):
        File "bin/single_node_perf_run.py", line 359, in <module>
          main()
        File "bin/single_node_perf_run.py", line 349, in main
          perf_ab_test(options, args)
        File "bin/single_node_perf_run.py", line 267, in perf_ab_test
          compare(temp_dir, hash_a, hash_b)
        File "bin/single_node_perf_run.py", line 175, in compare
          report_benchmark_results(file_a, file_b, description)
        File "bin/single_node_perf_run.py", line 166, in report_benchmark_results
          stdout=f)
        File "/home/gfurnstahl/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/subprocess.py", line 190, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/home/gfurnstahl/Impala/tests/benchmark/report_benchmark_results.py', '--reference_result_file=/home/gfurnstahl/Impala/perf_results/perf_run_0SdUw7/a87f8c5df9f6fbf8d468921642d7ec3d37c5f4de.json', '--input_result_file=/home/gfurnstahl/Impala/perf_results/perf_run_0SdUw7/b4d04112559c3f04ebf42b36deb1cd537dea78c4.json', '--report_description="a87f8c5df9f6fbf8d468921642d7ec3d37c5f4de vs b4d04112559c3f04ebf42b36deb1cd537dea78c4"']' returned non-zero exit status 1

      Attachments

        Activity

          People

            gfurnstahl Gergely Fürnstáhl
            gfurnstahl Gergely Fürnstáhl
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: