Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4433

timestamp_col NDV intermittently incorrect

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:
    • Environment:
      c3.4xlarge, ubuntu 14.04

      Description

      I can reproduce this 2 out of 6 times. Amos Bird also saw it:

      https://lists.apache.org/thread.html/6b63e7fbc7f86cd2253cdaadca41667b8c59d8cf89a95fd441e2f704@%3Cdev.impala.apache.org%3E

      From my Jenkins console:

       TestDdlStatements.test_truncate_table[exec_option: {'batch_size': 0, 'num_nodes': 0, 'sync_ddl': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-unique_database0] 
      [gw2] linux2 -- Python 2.7.6 /home/ubuntu/tmp.AXxcDDTCpm/bin/../infra/python/env/bin/python
      metadata/test_ddl.py:182: in test_truncate_table
          multiple_impalad=self._use_multiple_impalad(vector))
      common/impala_test_suite.py:340: in run_test_case
          self.__verify_results_and_errors(vector, test_section, result, use_db)
      common/impala_test_suite.py:234: in __verify_results_and_errors
          replace_filenames_with_placeholder)
      common/test_result_verifier.py:398: in verify_raw_results
          VERIFIER_MAP[verifier](expected, actual)
      common/test_result_verifier.py:231: in verify_query_result_is_equal
          assert expected_results == actual_results
      E   assert Comparing QueryTestResults (expected vs actual):
      E     Detailed information truncated, use "-vv" to show
      ---------------------------- Captured stderr setup -----------------------------
      SET sync_ddl=True;
      -- executing against localhost:21000
      DROP DATABASE IF EXISTS `test_truncate_table_d04c0434` CASCADE;
      
      SET sync_ddl=True;
      -- executing against localhost:21000
      CREATE DATABASE `test_truncate_table_d04c0434`;
      
      MainThread: Created database "test_truncate_table_d04c0434" for test ID "metadata/test_ddl.py::TestDdlStatements::()::test_truncate_table[exec_option: {'batch_size': 0, 'num_nodes': 0, 'sync_ddl': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-unique_database0]"
      ----------------------------- Captured stderr call -----------------------------
      -- executing against localhost:21000
      use test_truncate_table_d04c0434;
      
      SET batch_size=0;
      SET num_nodes=0;
      SET sync_ddl=0;
      SET disable_codegen=False;
      SET abort_on_error=False;
      SET exec_single_node_rows_threshold=0;
      -- executing against localhost:21000
      create table t1 like functional.alltypes
      location '/test-warehouse/test_truncate_table_d04c0434.db/t1';
      
      -- executing against localhost:21000
      
      insert into t1 partition(year, month) select * from functional.alltypes;
      
      -- executing against localhost:21000
      
      compute incremental stats t1;
      
      -- executing against localhost:21000
      
      show table stats t1;
      
      -- executing against localhost:21000
      show column stats t1;
      
      MainThread: Comparing QueryTestResults (expected vs actual):
      'bigint_col','BIGINT',10,-1,8,8 == 'bigint_col','BIGINT',10,-1,8,8
      'bool_col','BOOLEAN',2,-1,1,1 == 'bool_col','BOOLEAN',2,-1,1,1
      'date_string_col','STRING',736,-1,8,8 == 'date_string_col','STRING',736,-1,8,8
      'double_col','DOUBLE',10,-1,8,8 == 'double_col','DOUBLE',10,-1,8,8
      'float_col','FLOAT',10,-1,4,4 == 'float_col','FLOAT',10,-1,4,4
      'id','INT',7505,-1,4,4 == 'id','INT',7505,-1,4,4
      'int_col','INT',10,-1,4,4 == 'int_col','INT',10,-1,4,4
      'month','INT',12,0,4,4 == 'month','INT',12,0,4,4
      'smallint_col','SMALLINT',10,-1,2,2 == 'smallint_col','SMALLINT',10,-1,2,2
      'string_col','STRING',10,-1,1,1 == 'string_col','STRING',10,-1,1,1
      'timestamp_col','TIMESTAMP',7554,-1,16,16 != 'timestamp_col','TIMESTAMP',7552,-1,16,16
      'tinyint_col','TINYINT',10,-1,1,1 == 'tinyint_col','TINYINT',10,-1,1,1
      'year','INT',2,0,4,4 == 'year','INT',2,0,4,4
      

      To reproduce, run in an ec2 c3.4xlarge, using https://github.com/apache/incubator-impala/blob/master/bin/bootstrap_development.sh from a stock ubuntu 14.04 AMI.

      Following Amos Bird's thread, I checked

      Query: show column stats alltypes
      +-----------------+-----------+------------------+--------+----------+----------+
      | Column          | Type      | #Distinct Values | #Nulls | Max Size | Avg Size |
      +-----------------+-----------+------------------+--------+----------+----------+
      | id              | INT       | 7505             | -1     | 4        | 4        |
      | bool_col        | BOOLEAN   | 2                | -1     | 1        | 1        |
      | tinyint_col     | TINYINT   | 10               | -1     | 1        | 1        |
      | smallint_col    | SMALLINT  | 10               | -1     | 2        | 2        |
      | int_col         | INT       | 10               | -1     | 4        | 4        |
      | bigint_col      | BIGINT    | 10               | -1     | 8        | 8        |
      | float_col       | FLOAT     | 10               | -1     | 4        | 4        |
      | double_col      | DOUBLE    | 10               | -1     | 8        | 8        |
      | date_string_col | STRING    | 736              | -1     | 8        | 8        |
      | string_col      | STRING    | 10               | -1     | 1        | 1        |
      | timestamp_col   | TIMESTAMP | 7552             | -1     | 16       | 16       |
      | year            | INT       | 2                | 0      | 4        | 4        |
      | month           | INT       | 12               | 0      | 4        | 4        |
      +-----------------+-----------+------------------+--------+----------+----------+
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jbapple Jim Apple
                Reporter:
                jbapple Jim Apple
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: