Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3718

Improve functional testing on Impala Kudu

    Details

      Description

      The functional test coverage for Impala on Kudu can be significantly improved by doing the following:

      • Run TPC-H and TPC-DS against Kudu tables
      • Add an alltypes test table that covers all the supported Kudu data types and expand existing functional tests to use this table
      • Add more complex queries in the planner tests that query both Kudu and Hdfs tables
      • Add more complex expressions (e.g. conditional functions) in the SET portion of UPDATE statements in functional query tests
      • Improve the test coverage for using and computing stats in Kudu tables

        Issue Links

          Activity

          Hide
          mjacobs Matthew Jacobs added a comment -

          commit c7fa03286b473a34cdb170f8c89c261fb02d17a6
          Author: Matthew Jacobs <mj@cloudera.com>
          Date: Mon Aug 29 15:00:23 2016 -0700

          IMPALA-3718: Support subset of functional-query for Kudu

          Adds initial support for the functional-query test workload
          for Kudu tables.

          There are a few issues that make loading the functional
          schema difficult on Kudu:
          1) Kudu tables must have one or more columns that together
          constitute a unique primary key.
          a) Primary key columns must currently be the first columns
          in the table definition (KUDU-1271).
          b) Primary key columns cannot be nullable (KUDU-1570).
          2) Kudu tables must be specified with distribution
          parameters.

          (1) limits the tables that can be loaded without ugly
          workarounds. This patch only includes important tables that
          are used for relevant tests, most notably the alltypes*
          family. In particular, alltypesagg is important but it does
          not have a set of columns that are non-nullable and form a unique
          primary key. As a result, that table is created in Kudu with
          a different name and an additional BIGINT column for a PK
          that is a unique index and is generated at data loading time
          using the ROW_NUMBER analytic function. A view is then
          wrapped around the underlying table that matches the
          alltypesagg schema exactly. When KUDU-1570 is resolved, this
          can be simplified.

          (2) requires some additional considerations and custom
          syntax. As a result, the DDL to create the tables is
          explicitly specified in CREATE_KUDU sections in the
          functional_schema_constraints.csv, and an additional
          DEPENDENT_LOAD_KUDU section was added to specify custom data
          loading DML that differs from the existing DEPENDENT_LOAD.

          TODO: IMPALA-4005: generate_schema_statements.py needs refactoring

          Tests that are not relevant or not yet supported have been
          marked with xfail and a skip where appropriate.

          TODO: Support remaining functional tables/tests when possible.

          Change-Id: Iada88e078352e4462745d9a9a1b5111260d21acc
          Reviewed-on: http://gerrit.cloudera.org:8080/4175
          Reviewed-by: Matthew Jacobs <mj@cloudera.com>
          Tested-by: Internal Jenkins

          Show
          mjacobs Matthew Jacobs added a comment - commit c7fa03286b473a34cdb170f8c89c261fb02d17a6 Author: Matthew Jacobs <mj@cloudera.com> Date: Mon Aug 29 15:00:23 2016 -0700 IMPALA-3718 : Support subset of functional-query for Kudu Adds initial support for the functional-query test workload for Kudu tables. There are a few issues that make loading the functional schema difficult on Kudu: 1) Kudu tables must have one or more columns that together constitute a unique primary key. a) Primary key columns must currently be the first columns in the table definition ( KUDU-1271 ). b) Primary key columns cannot be nullable ( KUDU-1570 ). 2) Kudu tables must be specified with distribution parameters. (1) limits the tables that can be loaded without ugly workarounds. This patch only includes important tables that are used for relevant tests, most notably the alltypes* family. In particular, alltypesagg is important but it does not have a set of columns that are non-nullable and form a unique primary key. As a result, that table is created in Kudu with a different name and an additional BIGINT column for a PK that is a unique index and is generated at data loading time using the ROW_NUMBER analytic function. A view is then wrapped around the underlying table that matches the alltypesagg schema exactly. When KUDU-1570 is resolved, this can be simplified. (2) requires some additional considerations and custom syntax. As a result, the DDL to create the tables is explicitly specified in CREATE_KUDU sections in the functional_schema_constraints.csv, and an additional DEPENDENT_LOAD_KUDU section was added to specify custom data loading DML that differs from the existing DEPENDENT_LOAD. TODO: IMPALA-4005 : generate_schema_statements.py needs refactoring Tests that are not relevant or not yet supported have been marked with xfail and a skip where appropriate. TODO: Support remaining functional tables/tests when possible. Change-Id: Iada88e078352e4462745d9a9a1b5111260d21acc Reviewed-on: http://gerrit.cloudera.org:8080/4175 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins

            People

            • Assignee:
              mjacobs Matthew Jacobs
              Reporter:
              dtsirogiannis Dimitris Tsirogiannis
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development