Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5162

support kerberized+ssl TPC-H nested data loading

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Infrastructure
    • Labels:
      None

      Description

      load_nested.py should be enhanced to work against a kerberized+SSL Impala. load_nested.py uses the "Cluster" abstraction hierarchy in tests.comparison.cluster and the connection hierarchy in tests.comparison.db_connection. The patch will involve supporting SSL and bitrotted Kerberos test code in these abstractions.

      Testing the patch should involve both loading of nested data and regression testing data load and the following regression tests, which are needed because they also use the "Cluster" abstraction hierarchy:

      • full data load
      • data load with snapshot
      • loading of random data into Impala
      • copying data from Impala to PostgreSQL
      • stress test run (smoke)
      • query generator run (smoke)

        Activity

        Hide
        mikesbrown Michael Brown added a comment -
        commit 8b459dffec9e093e87da9ab6e8b2e5a9de50a7bd
        Author: Michael Brown <mikeb@cloudera.com>
        Date:   Fri Mar 31 10:39:54 2017 -0700
        
            IMPALA-5162,IMPALA-5163: stress test support on secure clusters
        
            This patch adds support for running the stress test
            (concurrent_select.py) and loading nested data (load_nested.py) into a
            Kerberized, SSL-enabled Impala cluster. It assumes the calling user
            already has a valid Kerberos ticket. One way to do that is:
        
            1. Get access to a keytab and krb5.config
            2. Set KRB5_CONFIG and KRB5CCNAME appropriately
            3. Run kinit(1)
            4. Run load_nested.py and/or concurrent_select.py within this
               environment.
        
            Because our Python clients already support Kerberos and SSL, we simply
            need to make sure to use the correct options when calling the entry
            points and initializing the clients:
        
            Impala: Impyla
            Hive: Impyla
            HDFS: hdfs.ext.kerberos.KerberosClient
        
            With this patch, I was able to manually do a short concurrent_select.py
            run against a secure cluster without connection or auth errors, and I
            was able to do the same with load_nested.py for a cluster that already
            had TPC-H loaded.
        
            Follow-ons for future cleanup work:
        
            IMPALA-5263: support CA bundles when running stress test against SSL'd
                         Impala
        
            IMPALA-5264: fix InsecurePlatformWarning under stress test with SSL
        
            Change-Id: I0daad57bb8ceeb5071b75125f11c1997ed7e0179
            Reviewed-on: http://gerrit.cloudera.org:8080/6763
            Reviewed-by: Matthew Mulder <mmulder@cloudera.com>
            Reviewed-by: Alex Behm <alex.behm@cloudera.com>
            Tested-by: Impala Public Jenkins
        
        Show
        mikesbrown Michael Brown added a comment - commit 8b459dffec9e093e87da9ab6e8b2e5a9de50a7bd Author: Michael Brown <mikeb@cloudera.com> Date: Fri Mar 31 10:39:54 2017 -0700 IMPALA-5162,IMPALA-5163: stress test support on secure clusters This patch adds support for running the stress test (concurrent_select.py) and loading nested data (load_nested.py) into a Kerberized, SSL-enabled Impala cluster. It assumes the calling user already has a valid Kerberos ticket. One way to do that is: 1. Get access to a keytab and krb5.config 2. Set KRB5_CONFIG and KRB5CCNAME appropriately 3. Run kinit(1) 4. Run load_nested.py and/or concurrent_select.py within this environment. Because our Python clients already support Kerberos and SSL, we simply need to make sure to use the correct options when calling the entry points and initializing the clients: Impala: Impyla Hive: Impyla HDFS: hdfs.ext.kerberos.KerberosClient With this patch, I was able to manually do a short concurrent_select.py run against a secure cluster without connection or auth errors, and I was able to do the same with load_nested.py for a cluster that already had TPC-H loaded. Follow-ons for future cleanup work: IMPALA-5263: support CA bundles when running stress test against SSL'd Impala IMPALA-5264: fix InsecurePlatformWarning under stress test with SSL Change-Id: I0daad57bb8ceeb5071b75125f11c1997ed7e0179 Reviewed-on: http://gerrit.cloudera.org:8080/6763 Reviewed-by: Matthew Mulder <mmulder@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            mikesbrown Michael Brown
            Reporter:
            mikesbrown Michael Brown
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development