Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Infrastructure
    • Labels:
      None

      Description

      The random query generator's QueryProfile should be extended to support deciding the following for INSERT operations:

      • The table into which to perform the INSERT
      • Whether the INSERT will have a SELECT clause or a VALUES clause
      • Whether the INSERT SELECT, if chosen, will have a column set, or if we are inserting into the whole table
      • Choosing the columns if a column set is needed
      • If a VALUES clause is chosen, choosing the number of rows added and those rows’ values

        Activity

        Show
        mikesbrown Michael Brown added a comment - https://gerrit.cloudera.org/#/c/5486/
        Hide
        mikesbrown Michael Brown added a comment -
        commit db7facdee0017965e276c830e2aee16590f7ce37
        Author: Michael Brown <mikeb@cloudera.com>
        Date:   Wed Dec 7 14:20:05 2016 -0800
        
            IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements
        
            - Generate INSERT statements that are either INSERT ... VALUES or INSERT
              ... SELECT
        
            - On both types of INSERTs, we either insert into all columns, or into
              some column list. If the column list exists, all primary keys will be
              present, and 0 or more additional columns will also be in the list.
              The ordering of the column list is random.
        
            - For INSERT ... SELECT, occasionally generate a WITH clause
        
            - For INSERT ... VALUES, generate non-null constants for the primary
              keys, but for the non-primary keys, randomly generate a value
              expression.
        
            The type system in the random statement/query generator isn't
            sophisticated enough to the implicit type of a SELECT item or a value
            expression. It knows it will be some INT-based type, but not if it's
            going to be a SMALLINT or a BIGINT. To get around this, the easiest
            thing seems to be to explicitly cast the SELECT items or value
            expressions to the columns' so-called exact_type attribute.
        
            Much of the testing here involved running discrepancy_searcher.py
            --explain-only on both tpch_kudu and a random HDFS table, using both the
            default profile and DML-only profile. This was done to quickly find bugs
            in the statement generation, as they tend to bubble up as analysis
            errors. I expect to make other changes as follow on patches and more
            random statements find small test issues.
        
            For actual use against Kudu data, you need to migrate data from Kudu
            into PostgreSQL 5 (instructions tests/comparison/POSTGRES.txt) and run
            something like:
        
            tests/comparison/discrepancy_searcher.py \
              --use-postgresql \
              --postgresql-port 5433 \
              --profile dmlonly \
              --timeout 300 \
              --db-name tpch_kudu \
              --query-count 10
        
            Change-Id: I842b41f0eed07ab30ec76d8fc3cdd5affb525af6
            Reviewed-on: http://gerrit.cloudera.org:8080/5486
            Reviewed-by: Jim Apple <jbapple-impala@apache.org>
            Tested-by: Impala Public Jenkins
        
        Show
        mikesbrown Michael Brown added a comment - commit db7facdee0017965e276c830e2aee16590f7ce37 Author: Michael Brown <mikeb@cloudera.com> Date: Wed Dec 7 14:20:05 2016 -0800 IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements - Generate INSERT statements that are either INSERT ... VALUES or INSERT ... SELECT - On both types of INSERTs, we either insert into all columns, or into some column list. If the column list exists, all primary keys will be present, and 0 or more additional columns will also be in the list. The ordering of the column list is random. - For INSERT ... SELECT, occasionally generate a WITH clause - For INSERT ... VALUES, generate non-null constants for the primary keys, but for the non-primary keys, randomly generate a value expression. The type system in the random statement/query generator isn't sophisticated enough to the implicit type of a SELECT item or a value expression. It knows it will be some INT-based type, but not if it's going to be a SMALLINT or a BIGINT. To get around this, the easiest thing seems to be to explicitly cast the SELECT items or value expressions to the columns' so-called exact_type attribute. Much of the testing here involved running discrepancy_searcher.py --explain-only on both tpch_kudu and a random HDFS table, using both the default profile and DML-only profile. This was done to quickly find bugs in the statement generation, as they tend to bubble up as analysis errors. I expect to make other changes as follow on patches and more random statements find small test issues. For actual use against Kudu data, you need to migrate data from Kudu into PostgreSQL 5 (instructions tests/comparison/POSTGRES.txt) and run something like: tests/comparison/discrepancy_searcher.py \ --use-postgresql \ --postgresql-port 5433 \ --profile dmlonly \ --timeout 300 \ --db-name tpch_kudu \ --query-count 10 Change-Id: I842b41f0eed07ab30ec76d8fc3cdd5affb525af6 Reviewed-on: http://gerrit.cloudera.org:8080/5486 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            mikesbrown Michael Brown
            Reporter:
            mikesbrown Michael Brown
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development