Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12679

Allow users to be able to specify an implementation of IMetaStoreClient via HiveConf

    Details

      Description

      Hi,

      I would like to propose a change that would make it possible for users to choose an implementation of IMetaStoreClient via HiveConf, i.e. hive-site.xml. Currently, in Hive the choice is hard coded to be SessionHiveMetaStoreClient in org.apache.hadoop.hive.ql.metadata.Hive. There is no other direct reference to SessionHiveMetaStoreClient other than the hard coded class name in Hive.java and the QL component operates only on the IMetaStoreClient interface so the change would be minimal and it would be quite similar to how an implementation of RawStore is specified and loaded in hive-metastore. One use case this change would serve would be one where a user wishes to use an implementation of this interface without the dependency on the Thrift server.

      Thank you,
      Austin

      1. HIVE-12679.1.patch
        13 kB
        Austin Lee
      2. HIVE-12679.2.patch
        14 kB
        Austin Lee
      3. HIVE-12679.patch
        13 kB
        Austin Lee

        Activity

        Hide
        thejas Thejas M Nair added a comment -

        Note that SessionHiveMetaStoreClient implements logic for temp tables, as they have a lifetime of a session. I am wondering if this config should determine which IMetaStoreClient impl is used by SessionHiveMetaStoreClient, instead of replacing SessionHiveMetaStoreClient itself.
        Some changes to SessionHiveMetaStoreClient would be needed as well for that approach, for it to accept configured IMetaStoreClient.

        I am curious about the use case here. Are you interested in replacing the use of thrift server because of any particular issues you see with it ?
        Note that you can use metastore server can be used in embedded mode, which means that the metastore client will effectively talk directly to the persistent store (RDBMS or HBase in case of hbase metastore).

        The hbase metastore work might possibly interest you - https://issues.apache.org/jira/browse/HIVE-9452, https://cwiki.apache.org/confluence/display/Hive/HBaseMetastoreDevelopmentGuide

        Show
        thejas Thejas M Nair added a comment - Note that SessionHiveMetaStoreClient implements logic for temp tables, as they have a lifetime of a session. I am wondering if this config should determine which IMetaStoreClient impl is used by SessionHiveMetaStoreClient, instead of replacing SessionHiveMetaStoreClient itself. Some changes to SessionHiveMetaStoreClient would be needed as well for that approach, for it to accept configured IMetaStoreClient. I am curious about the use case here. Are you interested in replacing the use of thrift server because of any particular issues you see with it ? Note that you can use metastore server can be used in embedded mode, which means that the metastore client will effectively talk directly to the persistent store (RDBMS or HBase in case of hbase metastore). The hbase metastore work might possibly interest you - https://issues.apache.org/jira/browse/HIVE-9452 , https://cwiki.apache.org/confluence/display/Hive/HBaseMetastoreDevelopmentGuide
        Hide
        austintlee Austin Lee added a comment -

        Thanks for your suggestion on the approach. Instead of having SessionHiveMetaStoreClient directly extend HiveMetaStoreClient via inheritance, if I understand you correctly, we can accomplish what I am proposing via composition, i.e., by creating a new member of type IMetaStoreClient in SessionHiveMetaStoreClient and use HiveConf to determine its concrete implementation at runtime? I was thinking of putting this logic in SessionHiveMetaStoreClient, but looking at the latest code in 2.1-snapshot, your approach might make more sense.

        As for the use case that I have in mind, I am really after more flexibility, e.g. not having dependency on Thrift, not having to run in embedded mode to eliminate dependency on Thrift, etc.

        Show
        austintlee Austin Lee added a comment - Thanks for your suggestion on the approach. Instead of having SessionHiveMetaStoreClient directly extend HiveMetaStoreClient via inheritance, if I understand you correctly, we can accomplish what I am proposing via composition, i.e., by creating a new member of type IMetaStoreClient in SessionHiveMetaStoreClient and use HiveConf to determine its concrete implementation at runtime? I was thinking of putting this logic in SessionHiveMetaStoreClient, but looking at the latest code in 2.1-snapshot, your approach might make more sense. As for the use case that I have in mind, I am really after more flexibility, e.g. not having dependency on Thrift, not having to run in embedded mode to eliminate dependency on Thrift, etc.
        Hide
        austintlee Austin Lee added a comment -

        I meant my original thinking was to put the logic to pick up the actual implementation in Hive.java, not in SessionHiveMetaStoreClient.java.

        Show
        austintlee Austin Lee added a comment - I meant my original thinking was to put the logic to pick up the actual implementation in Hive.java, not in SessionHiveMetaStoreClient.java.
        Hide
        austintlee Austin Lee added a comment -

        The patch contains changes to move the IMetaStoreClient construction logic into a factory class and use a HiveConf to load this factory class at runtime.

        Show
        austintlee Austin Lee added a comment - The patch contains changes to move the IMetaStoreClient construction logic into a factory class and use a HiveConf to load this factory class at runtime.
        Hide
        austintlee Austin Lee added a comment -

        The patch contains changes to move the IMetaStoreClient construction logic into a factory class and use a HiveConf to load this factory class at runtime.

        Show
        austintlee Austin Lee added a comment - The patch contains changes to move the IMetaStoreClient construction logic into a factory class and use a HiveConf to load this factory class at runtime.
        Hide
        austintlee Austin Lee added a comment -

        Could someone please review the attached patch? Thanks in advance for your time.

        Show
        austintlee Austin Lee added a comment - Could someone please review the attached patch? Thanks in advance for your time.
        Hide
        hiveqa Hive QA added a comment -

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12789271/HIVE-12679.patch

        SUCCESS: +1 due to 1 test(s) being added or modified.

        ERROR: -1 due to 4 failed/errored test(s), 9822 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
        org.apache.hadoop.hive.ql.metadata.TestHive.testLoadingHiveMetaStoreClientFactory
        org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testLoadingHiveMetaStoreClientFactory
        org.apache.hive.jdbc.TestSSL.testSSLVersion
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7074/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7074/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7074/

        Messages:

        Executing org.apache.hive.ptest.execution.TestCheckPhase
        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 4 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12789271 - PreCommit-HIVE-TRUNK-Build

        Show
        hiveqa Hive QA added a comment - Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12789271/HIVE-12679.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 4 failed/errored test(s), 9822 tests executed Failed tests: org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.metadata.TestHive.testLoadingHiveMetaStoreClientFactory org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testLoadingHiveMetaStoreClientFactory org.apache.hive.jdbc.TestSSL.testSSLVersion Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7074/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7074/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7074/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12789271 - PreCommit-HIVE-TRUNK-Build
        Hide
        austintlee Austin Lee added a comment -

        looking into the failed tests

        Show
        austintlee Austin Lee added a comment - looking into the failed tests
        Hide
        austintlee Austin Lee added a comment -

        unit test fixed

        Show
        austintlee Austin Lee added a comment - unit test fixed
        Hide
        austintlee Austin Lee added a comment -

        code review

        Show
        austintlee Austin Lee added a comment - code review
        Hide
        hiveqa Hive QA added a comment -

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12789639/HIVE-12679.1.patch

        SUCCESS: +1 due to 1 test(s) being added or modified.

        ERROR: -1 due to 4 failed/errored test(s), 9816 tests executed
        Failed tests:

        TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file
        org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
        org.apache.hadoop.hive.ql.TestTxnCommands2.testInitiatorWithMultipleFailedCompactions
        org.apache.hive.jdbc.TestSSL.testSSLVersion
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7094/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7094/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7094/

        Messages:

        Executing org.apache.hive.ptest.execution.TestCheckPhase
        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 4 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12789639 - PreCommit-HIVE-TRUNK-Build

        Show
        hiveqa Hive QA added a comment - Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12789639/HIVE-12679.1.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 4 failed/errored test(s), 9816 tests executed Failed tests: TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.TestTxnCommands2.testInitiatorWithMultipleFailedCompactions org.apache.hive.jdbc.TestSSL.testSSLVersion Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7094/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7094/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7094/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12789639 - PreCommit-HIVE-TRUNK-Build
        Hide
        alangates Alan Gates added a comment -

        I looked over the patch. The code itself seems fine. The question I have is about the approach. There are several features tied into SessionHiveMetastoreClient and HiveMetastoreClient (temp tables, metastore hooks, how to connect to a remote metastore, as well as the new file footer cache).

        I'd like to better understand what flexibility you need. If you just want to avoid connecting to the Thrift server that can be accomplished in the current code (e.g HS2 usually runs this way, the fast-path stuff in there runs this way). Is there some feature you need there that can't be added to HIveMetastoreClient or SessionsHiveMetastoreClient?

        Show
        alangates Alan Gates added a comment - I looked over the patch. The code itself seems fine. The question I have is about the approach. There are several features tied into SessionHiveMetastoreClient and HiveMetastoreClient (temp tables, metastore hooks, how to connect to a remote metastore, as well as the new file footer cache). I'd like to better understand what flexibility you need. If you just want to avoid connecting to the Thrift server that can be accomplished in the current code (e.g HS2 usually runs this way, the fast-path stuff in there runs this way). Is there some feature you need there that can't be added to HIveMetastoreClient or SessionsHiveMetastoreClient?
        Hide
        austintlee Austin Lee added a comment -

        These failures are not related to my change.

        Show
        austintlee Austin Lee added a comment - These failures are not related to my change.
        Hide
        hiveqa Hive QA added a comment -

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12805930/HIVE-12679.2.patch

        SUCCESS: +1 due to 1 test(s) being added or modified.

        ERROR: -1 due to 7 failed/errored test(s), 10221 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
        org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
        org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
        

        Test results: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/517/testReport
        Console output: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/517/console
        Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-517/

        Messages:

        Executing org.apache.hive.ptest.execution.TestCheckPhase
        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 7 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12805930 - PreCommit-HIVE-MASTER-Build

        Show
        hiveqa Hive QA added a comment - Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12805930/HIVE-12679.2.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 7 failed/errored test(s), 10221 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 Test results: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/517/testReport Console output: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/517/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-517/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed This message is automatically generated. ATTACHMENT ID: 12805930 - PreCommit-HIVE-MASTER-Build
        Hide
        rleidle Rob Leidle added a comment -

        Thejas M Nair, can you take another look?

        Show
        rleidle Rob Leidle added a comment - Thejas M Nair, can you take another look?

          People

          • Assignee:
            austintlee Austin Lee
            Reporter:
            austintlee Austin Lee
          • Votes:
            2 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:

              Development