Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4081

Class rename will break old external data sources

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      IMPALA-3786 involves renaming all Java packages to org.apache.impala from com.cloudera.impala.

      We have an "external data source" API which exposes an interface com.cloudera.impala.v1.ExternalDataSource that external "data sources" can implement and register with Impala. When we change the package names, this will break. We need to think about backwards compatibility or being OK with the breaking change and recommending some upgrade path.

        Activity

        Hide
        jbapple Jim Apple added a comment -

        My feeling is that this belongs as a comment (on IMPALA-3786) that stops it from proceeding until we have a solution, rather than as its own JIRA. What do you think?

        Show
        jbapple Jim Apple added a comment - My feeling is that this belongs as a comment (on IMPALA-3786 ) that stops it from proceeding until we have a solution, rather than as its own JIRA. What do you think?
        Hide
        mjacobs Matthew Jacobs added a comment -

        Yup, there is a comment there as well, but I filed this so it wouldn't get lost. Given that Thomas was out I was hoping we could get some attention on this sooner. I think we could and probably should fix backwards compatibility as a follow-up task, assuming we have a plan for it.

        Show
        mjacobs Matthew Jacobs added a comment - Yup, there is a comment there as well, but I filed this so it wouldn't get lost. Given that Thomas was out I was hoping we could get some attention on this sooner. I think we could and probably should fix backwards compatibility as a follow-up task, assuming we have a plan for it.
        Hide
        jbapple Jim Apple added a comment -

        So you would prefer if we did not roll that fix into one patch that both changed the package names and fixed this issue?

        Show
        jbapple Jim Apple added a comment - So you would prefer if we did not roll that fix into one patch that both changed the package names and fixed this issue?
        Hide
        mjacobs Matthew Jacobs added a comment -

        That change is very mechanical, this seems more involved and requires new logic & upgrade testing. Seems like it's worth pulling out. I don't care though as long as it doesn't get buried, I'd say it's up to the author and primary reviewer.

        Also it's worth considering if we just break compatibility and require rebuilding a data source. AFAIK this isn't widely used so maybe that's not crazy.

        Show
        mjacobs Matthew Jacobs added a comment - That change is very mechanical, this seems more involved and requires new logic & upgrade testing. Seems like it's worth pulling out. I don't care though as long as it doesn't get buried, I'd say it's up to the author and primary reviewer. Also it's worth considering if we just break compatibility and require rebuilding a data source. AFAIK this isn't widely used so maybe that's not crazy.
        Hide
        jbapple Jim Apple added a comment -

        That makes sense. Let's see how Thomas feels.

        Show
        jbapple Jim Apple added a comment - That makes sense. Let's see how Thomas feels.
        Hide
        mjacobs Matthew Jacobs added a comment -

        Assigning to Justin while we're waiting to hear back on whether or not the new code needs to be supporting old compiled data sources, or if requiring recompilation is OK.

        Show
        mjacobs Matthew Jacobs added a comment - Assigning to Justin while we're waiting to hear back on whether or not the new code needs to be supporting old compiled data sources, or if requiring recompilation is OK.
        Hide
        dhecht Dan Hecht added a comment -
        Show
        dhecht Dan Hecht added a comment - Ping Justin Erickson , Greg Rahn
        Hide
        justin@cloudera.com Justin Erickson added a comment -

        Still waiting for response from the account team. The main person was on vacation and now Strata so will ping again after Strata calms down

        Show
        justin@cloudera.com Justin Erickson added a comment - Still waiting for response from the account team. The main person was on vacation and now Strata so will ping again after Strata calms down
        Hide
        hsheinblatt_impala_e511 Harrison Sheinblatt added a comment -

        I think this test failure may be caused by IMPALA-3786. If so, and we decide not to fix it, we may need to fix a test: http://sandbox.jenkins.sf.cloudera.com/job/impala-umbrella-build-and-test/4899/console

        03:34:53 =================================== FAILURES ===================================
        03:34:53  TestQueriesTextTables.test_data_source_tables[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
        03:34:53 [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python
        03:34:53 query_test/test_queries.py:155: in test_data_source_tables
        03:34:53     self.run_test_case('QueryTest/data-source-tables', vector)
        03:34:53 common/impala_test_suite.py:320: in run_test_case
        03:34:53     result = self.__execute_query(target_impalad_client, query, user=user)
        03:34:53 common/impala_test_suite.py:511: in __execute_query
        03:34:53     return impalad_client.execute(query, user=user)
        03:34:53 common/impala_connection.py:160: in execute
        03:34:53     return self.__beeswax_client.execute(sql_stmt, user=user)
        03:34:53 beeswax/impala_beeswax.py:173: in execute
        03:34:53     handle = self.__execute_query(query_string.strip(), user=user)
        03:34:53 beeswax/impala_beeswax.py:337: in __execute_query
        03:34:53     handle = self.execute_query_async(query_string, user=user)
        03:34:53 beeswax/impala_beeswax.py:333: in execute_query_async
        03:34:53     return self.__do_rpc(lambda: self.imp_service.query(query,))
        03:34:53 beeswax/impala_beeswax.py:458: in __do_rpc
        03:34:53     raise ImpalaBeeswaxException(self.__build_error_message(b), b)
        03:34:53 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
        03:34:53 E    INNER EXCEPTION: <class 'impala._thrift_gen.beeswax.ttypes.BeeswaxException'>
        03:34:53 E    MESSAGE: NoClassDefFoundError: com/cloudera/impala/extdatasource/v1/ExternalDataSource
        03:34:53 E   CAUSED BY: ClassNotFoundException: com.cloudera.impala.extdatasource.v1.ExternalDataSource
        03:34:53 ----------------------------- Captured stderr call -----------------------------
        03:34:53 -- executing against localhost:21000
        03:34:53 use functional;
        03:34:53 
        03:34:53 SET disable_codegen=False;
        03:34:53 SET abort_on_error=1;
        03:34:53 SET exec_single_node_rows_threshold=0;
        03:34:53 SET batch_size=0;
        03:34:53 SET num_nodes=0;
        03:34:53 -- executing against localhost:21000
        03:34:53 select *
        03:34:53 from alltypes_datasource
        03:34:53 where float_col != 0 and
        03:34:53       int_col >= 1990 limit 5;
        03:34:53 
        03:34:53  generated xml file: /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/logs/ee_tests/results/TEST-impala-parallel.xml 
        

        Same error in following local build: http://sandbox.jenkins.sf.cloudera.com/job/impala-umbrella-build-and-test/4918/console

        Show
        hsheinblatt_impala_e511 Harrison Sheinblatt added a comment - I think this test failure may be caused by IMPALA-3786 . If so, and we decide not to fix it, we may need to fix a test: http://sandbox.jenkins.sf.cloudera.com/job/impala-umbrella-build-and-test/4899/console 03:34:53 =================================== FAILURES =================================== 03:34:53 TestQueriesTextTables.test_data_source_tables[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 03:34:53 [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python 03:34:53 query_test/test_queries.py:155: in test_data_source_tables 03:34:53 self.run_test_case('QueryTest/data-source-tables', vector) 03:34:53 common/impala_test_suite.py:320: in run_test_case 03:34:53 result = self.__execute_query(target_impalad_client, query, user=user) 03:34:53 common/impala_test_suite.py:511: in __execute_query 03:34:53 return impalad_client.execute(query, user=user) 03:34:53 common/impala_connection.py:160: in execute 03:34:53 return self.__beeswax_client.execute(sql_stmt, user=user) 03:34:53 beeswax/impala_beeswax.py:173: in execute 03:34:53 handle = self.__execute_query(query_string.strip(), user=user) 03:34:53 beeswax/impala_beeswax.py:337: in __execute_query 03:34:53 handle = self.execute_query_async(query_string, user=user) 03:34:53 beeswax/impala_beeswax.py:333: in execute_query_async 03:34:53 return self.__do_rpc(lambda: self.imp_service.query(query,)) 03:34:53 beeswax/impala_beeswax.py:458: in __do_rpc 03:34:53 raise ImpalaBeeswaxException(self.__build_error_message(b), b) 03:34:53 E ImpalaBeeswaxException: ImpalaBeeswaxException: 03:34:53 E INNER EXCEPTION: <class 'impala._thrift_gen.beeswax.ttypes.BeeswaxException'> 03:34:53 E MESSAGE: NoClassDefFoundError: com/cloudera/impala/extdatasource/v1/ExternalDataSource 03:34:53 E CAUSED BY: ClassNotFoundException: com.cloudera.impala.extdatasource.v1.ExternalDataSource 03:34:53 ----------------------------- Captured stderr call ----------------------------- 03:34:53 -- executing against localhost:21000 03:34:53 use functional; 03:34:53 03:34:53 SET disable_codegen=False; 03:34:53 SET abort_on_error=1; 03:34:53 SET exec_single_node_rows_threshold=0; 03:34:53 SET batch_size=0; 03:34:53 SET num_nodes=0; 03:34:53 -- executing against localhost:21000 03:34:53 select * 03:34:53 from alltypes_datasource 03:34:53 where float_col != 0 and 03:34:53 int_col >= 1990 limit 5; 03:34:53 03:34:53 generated xml file: /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/logs/ee_tests/results/TEST-impala-parallel.xml Same error in following local build: http://sandbox.jenkins.sf.cloudera.com/job/impala-umbrella-build-and-test/4918/console
        Hide
        hsheinblatt_impala_e511 Harrison Sheinblatt added a comment -

        Similar failure in local run for cdh, different test but may be related: http://sandbox.jenkins.sf.cloudera.com/view/Impala/view/Evergreen-cdh5-trunk/job/impala-cdh5-trunk-core-local-filesystem/154/

        09:22:45 =================================== FAILURES ===================================
        09:22:45  TestLibCache.test_create_drop_data_src[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
        09:22:45 metadata/test_ddl.py:453: in test_create_drop_data_src
        09:22:45     self.create_drop_ddl(vector, create_stmts, drop_stmts, select_stmt, num_iterations)
        09:22:45 metadata/test_ddl.py:494: in create_drop_ddl
        09:22:45     self.client.execute(select_stmt)
        09:22:45 common/impala_connection.py:160: in execute
        09:22:45     return self.__beeswax_client.execute(sql_stmt, user=user)
        09:22:45 beeswax/impala_beeswax.py:173: in execute
        09:22:45     handle = self.__execute_query(query_string.strip(), user=user)
        09:22:45 beeswax/impala_beeswax.py:337: in __execute_query
        09:22:45     handle = self.execute_query_async(query_string, user=user)
        09:22:45 beeswax/impala_beeswax.py:333: in execute_query_async
        09:22:45     return self.__do_rpc(lambda: self.imp_service.query(query,))
        09:22:45 beeswax/impala_beeswax.py:458: in __do_rpc
        09:22:45     raise ImpalaBeeswaxException(self.__build_error_message(b), b)
        09:22:45 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
        09:22:45 E    INNER EXCEPTION: <class 'impala._thrift_gen.beeswax.ttypes.BeeswaxException'>
        09:22:45 E    MESSAGE: InternalException: Error calling prepare() on data source DataSource{name=test_create_drop_data_src_a7bc14f7_datasrc, location=file:/tmp/test-warehouse/data-sources/test-data-source.jar, className=org.apache.impala.extdatasource.AllTypesDataSource, apiVersion=V1}
        09:22:45 E   CAUSED BY: ImpalaRuntimeException: Unable to load external data source library from path=/tmp/test-data-source.13373.0.jar className=org.apache.impala.extdatasource.AllTypesDataSource apiVersion=V1
        09:22:45 E   CAUSED BY: ClassNotFoundException: org.apache.impala.extdatasource.AllTypesDataSource
        09:22:45 ---------------------------- Captured stderr setup -----------------------------
        09:22:45 -- connecting to: localhost:21000
        09:22:45 SET sync_ddl=False;
        09:22:45 -- executing against localhost:21000
        09:22:45 DROP DATABASE IF EXISTS `test_create_drop_data_src_a7bc14f7` CASCADE;
        09:22:45 
        09:22:45 SET sync_ddl=False;
        09:22:45 -- executing against localhost:21000
        09:22:45 CREATE DATABASE `test_create_drop_data_src_a7bc14f7`;
        09:22:45 
        09:22:45 MainThread: Created database "test_create_drop_data_src_a7bc14f7" for test ID "metadata/test_ddl.py::TestLibCache::()::test_create_drop_data_src[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]"
        09:22:45 ----------------------------- Captured stderr call -----------------------------
        09:22:45 SET disable_codegen=False;
        09:22:45 SET abort_on_error=1;
        09:22:45 SET exec_single_node_rows_threshold=0;
        09:22:45 SET batch_size=0;
        09:22:45 SET num_nodes=1;
        09:22:45 -- executing against localhost:21000
        09:22:45 drop table if exists test_create_drop_data_src_a7bc14f7.data_src_tbl;
        09:22:45 
        09:22:45 -- executing against localhost:21000
        09:22:45 drop data source if exists test_create_drop_data_src_a7bc14f7_datasrc;
        09:22:45 
        09:22:45 -- executing against localhost:21000
        09:22:45 CREATE DATA SOURCE test_create_drop_data_src_a7bc14f7_datasrc LOCATION 'file:/tmp/test-warehouse/data-sources/test-data-source.jar' CLASS 'org.apache.impala.extdatasource.AllTypesDataSource' API_VERSION 'V1';
        09:22:45 
        09:22:45 -- executing against localhost:21000
        09:22:45 CREATE TABLE test_create_drop_data_src_a7bc14f7.data_src_tbl (x int) PRODUCED BY DATA SOURCE test_create_drop_data_src_a7bc14f7_datasrc('dummy_init_string');
        09:22:45 
        09:22:45 -- executing against localhost:21000
        09:22:45 select * from test_create_drop_data_src_a7bc14f7.data_src_tbl limit 1;
        09:22:45 
        09:22:45  generated xml file: /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/logs/ee_tests/results/TEST-impala-serial.xml 
        09:22:45 =========================== short test summary info ============================
        09:22:45 FAIL metadata/test_ddl.py::TestLibCache::()::test_create_drop_data_src[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]
        
        Show
        hsheinblatt_impala_e511 Harrison Sheinblatt added a comment - Similar failure in local run for cdh, different test but may be related: http://sandbox.jenkins.sf.cloudera.com/view/Impala/view/Evergreen-cdh5-trunk/job/impala-cdh5-trunk-core-local-filesystem/154/ 09:22:45 =================================== FAILURES =================================== 09:22:45 TestLibCache.test_create_drop_data_src[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 09:22:45 metadata/test_ddl.py:453: in test_create_drop_data_src 09:22:45 self.create_drop_ddl(vector, create_stmts, drop_stmts, select_stmt, num_iterations) 09:22:45 metadata/test_ddl.py:494: in create_drop_ddl 09:22:45 self.client.execute(select_stmt) 09:22:45 common/impala_connection.py:160: in execute 09:22:45 return self.__beeswax_client.execute(sql_stmt, user=user) 09:22:45 beeswax/impala_beeswax.py:173: in execute 09:22:45 handle = self.__execute_query(query_string.strip(), user=user) 09:22:45 beeswax/impala_beeswax.py:337: in __execute_query 09:22:45 handle = self.execute_query_async(query_string, user=user) 09:22:45 beeswax/impala_beeswax.py:333: in execute_query_async 09:22:45 return self.__do_rpc(lambda: self.imp_service.query(query,)) 09:22:45 beeswax/impala_beeswax.py:458: in __do_rpc 09:22:45 raise ImpalaBeeswaxException(self.__build_error_message(b), b) 09:22:45 E ImpalaBeeswaxException: ImpalaBeeswaxException: 09:22:45 E INNER EXCEPTION: <class 'impala._thrift_gen.beeswax.ttypes.BeeswaxException'> 09:22:45 E MESSAGE: InternalException: Error calling prepare() on data source DataSource{name=test_create_drop_data_src_a7bc14f7_datasrc, location=file:/tmp/test-warehouse/data-sources/test-data-source.jar, className=org.apache.impala.extdatasource.AllTypesDataSource, apiVersion=V1} 09:22:45 E CAUSED BY: ImpalaRuntimeException: Unable to load external data source library from path=/tmp/test-data-source.13373.0.jar className=org.apache.impala.extdatasource.AllTypesDataSource apiVersion=V1 09:22:45 E CAUSED BY: ClassNotFoundException: org.apache.impala.extdatasource.AllTypesDataSource 09:22:45 ---------------------------- Captured stderr setup ----------------------------- 09:22:45 -- connecting to: localhost:21000 09:22:45 SET sync_ddl=False; 09:22:45 -- executing against localhost:21000 09:22:45 DROP DATABASE IF EXISTS `test_create_drop_data_src_a7bc14f7` CASCADE; 09:22:45 09:22:45 SET sync_ddl=False; 09:22:45 -- executing against localhost:21000 09:22:45 CREATE DATABASE `test_create_drop_data_src_a7bc14f7`; 09:22:45 09:22:45 MainThread: Created database "test_create_drop_data_src_a7bc14f7" for test ID "metadata/test_ddl.py::TestLibCache::()::test_create_drop_data_src[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]" 09:22:45 ----------------------------- Captured stderr call ----------------------------- 09:22:45 SET disable_codegen=False; 09:22:45 SET abort_on_error=1; 09:22:45 SET exec_single_node_rows_threshold=0; 09:22:45 SET batch_size=0; 09:22:45 SET num_nodes=1; 09:22:45 -- executing against localhost:21000 09:22:45 drop table if exists test_create_drop_data_src_a7bc14f7.data_src_tbl; 09:22:45 09:22:45 -- executing against localhost:21000 09:22:45 drop data source if exists test_create_drop_data_src_a7bc14f7_datasrc; 09:22:45 09:22:45 -- executing against localhost:21000 09:22:45 CREATE DATA SOURCE test_create_drop_data_src_a7bc14f7_datasrc LOCATION 'file:/tmp/test-warehouse/data-sources/test-data-source.jar' CLASS 'org.apache.impala.extdatasource.AllTypesDataSource' API_VERSION 'V1'; 09:22:45 09:22:45 -- executing against localhost:21000 09:22:45 CREATE TABLE test_create_drop_data_src_a7bc14f7.data_src_tbl (x int) PRODUCED BY DATA SOURCE test_create_drop_data_src_a7bc14f7_datasrc('dummy_init_string'); 09:22:45 09:22:45 -- executing against localhost:21000 09:22:45 select * from test_create_drop_data_src_a7bc14f7.data_src_tbl limit 1; 09:22:45 09:22:45 generated xml file: /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/logs/ee_tests/results/TEST-impala-serial.xml 09:22:45 =========================== short test summary info ============================ 09:22:45 FAIL metadata/test_ddl.py::TestLibCache::()::test_create_drop_data_src[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]
        Hide
        mjacobs Matthew Jacobs added a comment -

        I think these failures are just for local fs runs where it skips a full data load, I think due to https://gerrit.cloudera.org/#/c/3384/ which resolves https://issues.cloudera.org/browse/IMPALA-3737 and seems to imply this is expected for non-hdfs filesystems. I'm re-running the test now to verify. If this is the case I'll open a separate non-blocker JIRA to address handling schema changes better on local, and this JIRA should focus on the issue of back-compat for ext data sources.

        Show
        mjacobs Matthew Jacobs added a comment - I think these failures are just for local fs runs where it skips a full data load, I think due to https://gerrit.cloudera.org/#/c/3384/ which resolves https://issues.cloudera.org/browse/IMPALA-3737 and seems to imply this is expected for non-hdfs filesystems. I'm re-running the test now to verify. If this is the case I'll open a separate non-blocker JIRA to address handling schema changes better on local, and this JIRA should focus on the issue of back-compat for ext data sources.
        Hide
        mjacobs Matthew Jacobs added a comment -

        FYI the jobs are passing now, I think those runs were operating in a stale environment. If those failures come up again, let's file a separate JIRA. I'd like to keep this one to track the issue of backward compatibility.

        Show
        mjacobs Matthew Jacobs added a comment - FYI the jobs are passing now, I think those runs were operating in a stale environment. If those failures come up again, let's file a separate JIRA. I'd like to keep this one to track the issue of backward compatibility.
        Hide
        justin@cloudera.com Justin Erickson added a comment -

        Confirmed that the account that was using this internal API is no longer using it.

        Show
        justin@cloudera.com Justin Erickson added a comment - Confirmed that the account that was using this internal API is no longer using it.
        Hide
        mjacobs Matthew Jacobs added a comment -

        Thanks Justin Erickson. Downgrading and closing since we don't need to make it backwards compatible then.

        Show
        mjacobs Matthew Jacobs added a comment - Thanks Justin Erickson . Downgrading and closing since we don't need to make it backwards compatible then.

          People

          • Assignee:
            justin@cloudera.com Justin Erickson
            Reporter:
            mjacobs Matthew Jacobs
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development