Hive
  1. Hive
  2. HIVE-4226

Cleanup non-threadsafe code in Hive

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      There is some code in Hive that is not threadsafe. These usually bubble up as problems in Hive Server. This JIRA tracks fixing (hopefully, all) of these issues.

      Some context: we've implemented a multi-tenant (multiple dbs), multi-threaded hive server at Qubole (QHS) which is running in production for a couple of months now. As part of this effort, we've fixed a number of instances of non-threadsafe code. I'm looking to contribute this back to the community.

      Note that there is no new functionality here - just some better hygiene. If there are any stress tests that have revealed hive server bugs in the past, it will be great if they can be added to the jira.

      Also, this is my first attempt at contributing to Apache, so please forgive any mistakes.

      1. HIVE-4226.1.patch.txt
        45 kB
        Sivaramakrishnan Narayanan
      2. HIVE-4226.2.patch.txt
        43 kB
        Sivaramakrishnan Narayanan
      3. HIVE-4226.D11703.1.patch
        40 kB
        Phabricator

        Activity

        Sivaramakrishnan Narayanan created issue -
        Hide
        Shreepadma Venugopalan added a comment -

        Sivaramakrishnan Narayanan: Thank you very much for contributing this patch to the project. I've a question regarding the QHS. Does this build on the existing HiveServer or is this something you guys have built from scratch?

        Show
        Shreepadma Venugopalan added a comment - Sivaramakrishnan Narayanan : Thank you very much for contributing this patch to the project. I've a question regarding the QHS. Does this build on the existing HiveServer or is this something you guys have built from scratch?
        Hide
        Shreepadma Venugopalan added a comment -

        HIVE-4141, HIVE-4075 are relevant and were recently fixed.

        Show
        Shreepadma Venugopalan added a comment - HIVE-4141 , HIVE-4075 are relevant and were recently fixed.
        Hide
        Sivaramakrishnan Narayanan added a comment -

        Shreepadma Venugopalan QHS was built from scratch. I will be writing a blog about it this week and send it around. Basically, Qubole offers hive as a service to multiple accounts. Each account has a separate metastore (and separate hadoop cluster). Each of these accounts may submit queries concurrently. Hence, we required a multi-tenant, multi-threaded hive server.

        However, the API itself is pretty modest compared to existing HiveServer. The API simply asks QHS to execute a set of statements. This is because Qubole maintains a notion of session outside of HiveServer and passes state modifying statements (e.g. set statements) as part of an 'execute' call to QHS.

        Getting QHS to run smoothly required a few other changes which are not directly related to this JIRA, but might well be applicable to any incarnation of HiveServer. I'll write these up as well and, hopefully, someone will find them useful.

        Show
        Sivaramakrishnan Narayanan added a comment - Shreepadma Venugopalan QHS was built from scratch. I will be writing a blog about it this week and send it around. Basically, Qubole offers hive as a service to multiple accounts. Each account has a separate metastore (and separate hadoop cluster). Each of these accounts may submit queries concurrently. Hence, we required a multi-tenant, multi-threaded hive server. However, the API itself is pretty modest compared to existing HiveServer. The API simply asks QHS to execute a set of statements. This is because Qubole maintains a notion of session outside of HiveServer and passes state modifying statements (e.g. set statements) as part of an 'execute' call to QHS. Getting QHS to run smoothly required a few other changes which are not directly related to this JIRA, but might well be applicable to any incarnation of HiveServer. I'll write these up as well and, hopefully, someone will find them useful.
        Hide
        Eric Hanson added a comment -

        This sounds like a great addition. We're working to speed up Hive with vectorized query execution and are thinking ahead about reducing process overhead by re-using processes and introducing multi-threading. This Jira should help with that. Please post a list of known issues or a design discussion if you have one.

        Show
        Eric Hanson added a comment - This sounds like a great addition. We're working to speed up Hive with vectorized query execution and are thinking ahead about reducing process overhead by re-using processes and introducing multi-threading. This Jira should help with that. Please post a list of known issues or a design discussion if you have one.
        Hide
        Sivaramakrishnan Narayanan added a comment -

        Attaching a patch I created against 0.11. Gist of the patch is:

        • number of statics have been converted to threadlocals
        • some unsafe caches (around inputformats) which relied on statics have been removed.
        • in function registry, I separated the native functions (that live in a static) and temporary ones (these live in threadlocal)
        • changes in ObjectStore to switch from statics to local vars
        Show
        Sivaramakrishnan Narayanan added a comment - Attaching a patch I created against 0.11. Gist of the patch is: number of statics have been converted to threadlocals some unsafe caches (around inputformats) which relied on statics have been removed. in function registry, I separated the native functions (that live in a static) and temporary ones (these live in threadlocal) changes in ObjectStore to switch from statics to local vars
        Sivaramakrishnan Narayanan made changes -
        Field Original Value New Value
        Attachment HIVE-4226.1.patch.txt [ 12591680 ]
        Hide
        Sivaramakrishnan Narayanan added a comment -

        Patch doesn't apply cleanly to trunk. Might need a day to figure this out and submit a new patch against trunk.

        Show
        Sivaramakrishnan Narayanan added a comment - Patch doesn't apply cleanly to trunk. Might need a day to figure this out and submit a new patch against trunk.
        Hide
        Sivaramakrishnan Narayanan added a comment -

        Attaching new version (HIVE-4226.2.patch.txt) of patch that applies to trunk.

        Show
        Sivaramakrishnan Narayanan added a comment - Attaching new version ( HIVE-4226 .2.patch.txt) of patch that applies to trunk.
        Sivaramakrishnan Narayanan made changes -
        Attachment HIVE-4226.2.patch.txt [ 12591691 ]
        Ashutosh Chauhan made changes -
        Assignee Sivaramakrishnan Narayanan [ snarayanan ]
        Hide
        Ashutosh Chauhan added a comment -

        Thanks, Sivaramakrishnan Narayanan for taking a crack at this one. Quite useful.

        Show
        Ashutosh Chauhan added a comment - Thanks, Sivaramakrishnan Narayanan for taking a crack at this one. Quite useful.
        Hide
        Ashutosh Chauhan added a comment -

        Sivaramakrishnan Narayanan Can you create a phabricator or RB entry for this, so that its easier to review?

        Show
        Ashutosh Chauhan added a comment - Sivaramakrishnan Narayanan Can you create a phabricator or RB entry for this, so that its easier to review?
        Hide
        Sivaramakrishnan Narayanan added a comment -

        I'm not sure what the right way to create phabricator review is. Is there a guide for how to do this the right way?

        I've created a review using the patch here: https://reviews.facebook.net/D11697 Please let me know if it is accessible.

        Show
        Sivaramakrishnan Narayanan added a comment - I'm not sure what the right way to create phabricator review is. Is there a guide for how to do this the right way? I've created a review using the patch here: https://reviews.facebook.net/D11697 Please let me know if it is accessible.
        Hide
        Sivaramakrishnan Narayanan added a comment -

        I found the guide. Will create a new one the right way: https://cwiki.apache.org/confluence/display/Hive/PhabricatorCodeReview

        Show
        Sivaramakrishnan Narayanan added a comment - I found the guide. Will create a new one the right way: https://cwiki.apache.org/confluence/display/Hive/PhabricatorCodeReview
        Hide
        Phabricator added a comment -

        tarball requested code review of "HIVE-4226 [jira] Cleanup non-threadsafe code in Hive".

        Reviewers: JIRA

        HIVE-4226. Fixing some non-threadsafe code in hive

        There is some code in Hive that is not threadsafe. These usually bubble up as problems in Hive Server. This JIRA tracks fixing (hopefully, all) of these issues.

        Some context: we've implemented a multi-tenant (multiple dbs), multi-threaded hive server at Qubole (QHS) which is running in production for a couple of months now. As part of this effort, we've fixed a number of instances of non-threadsafe code. I'm looking to contribute this back to the community.

        Note that there is no new functionality here - just some better hygiene. If there are any stress tests that have revealed hive server bugs in the past, it will be great if they can be added to the jira.

        Also, this is my first attempt at contributing to Apache, so please forgive any mistakes.

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D11703

        AFFECTED FILES
        metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
        ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java
        ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java
        ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
        ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
        ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
        ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveRecordReader.java
        ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
        ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
        ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
        ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java
        ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java

        MANAGE HERALD RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/27753/

        To: JIRA, tarball

        Show
        Phabricator added a comment - tarball requested code review of " HIVE-4226 [jira] Cleanup non-threadsafe code in Hive". Reviewers: JIRA HIVE-4226 . Fixing some non-threadsafe code in hive There is some code in Hive that is not threadsafe. These usually bubble up as problems in Hive Server. This JIRA tracks fixing (hopefully, all) of these issues. Some context: we've implemented a multi-tenant (multiple dbs), multi-threaded hive server at Qubole (QHS) which is running in production for a couple of months now. As part of this effort, we've fixed a number of instances of non-threadsafe code. I'm looking to contribute this back to the community. Note that there is no new functionality here - just some better hygiene. If there are any stress tests that have revealed hive server bugs in the past, it will be great if they can be added to the jira. Also, this is my first attempt at contributing to Apache, so please forgive any mistakes. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11703 AFFECTED FILES metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveRecordReader.java ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/27753/ To: JIRA, tarball
        Phabricator made changes -
        Attachment HIVE-4226.D11703.1.patch [ 12592545 ]
        Hide
        Sivaramakrishnan Narayanan added a comment -

        FYI - I ran into this problem. Fortunately, the solution in the email chain worked for me.

        Show
        Sivaramakrishnan Narayanan added a comment - FYI - I ran into this problem. Fortunately, the solution in the email chain worked for me.
        Hide
        Sivaramakrishnan Narayanan added a comment -

        Also, the blog link that describes Qubole Hive Server which was the motivation for this patch is here.

        Show
        Sivaramakrishnan Narayanan added a comment - Also, the blog link that describes Qubole Hive Server which was the motivation for this patch is here .
        Hide
        Phabricator added a comment -

        brock has commented on the revision "HIVE-4226 [jira] Cleanup non-threadsafe code in Hive".

        minor comment

        INLINE COMMENTS
        ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:170 Can we tighten up the visibility? For example I believe this and the one below can be private.

        REVISION DETAIL
        https://reviews.facebook.net/D11703

        To: JIRA, tarball
        Cc: brock

        Show
        Phabricator added a comment - brock has commented on the revision " HIVE-4226 [jira] Cleanup non-threadsafe code in Hive". minor comment INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:170 Can we tighten up the visibility? For example I believe this and the one below can be private. REVISION DETAIL https://reviews.facebook.net/D11703 To: JIRA, tarball Cc: brock

          People

          • Assignee:
            Sivaramakrishnan Narayanan
            Reporter:
            Sivaramakrishnan Narayanan
          • Votes:
            1 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

            • Created:
              Updated:

              Development