Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: JDBC, Metastore, ODBC, Security
    • Labels:
      None

      Description

      MetaStoreListener and HiveMetaHook both serve as a notification mechanism for metastore-related events. The former is used by hcat and the latter is by the hbase-storage handler, and invoked by the client.
      I propose to merge these interfaces, and extend the MetaStoreListener, to add most of the on- and pre- methods at the Thrift interface. This way, extending metastore will be easier, and validation, storage-driver notification, and enforcement can be delegated to individual listeners. Besides, more functionality can be plugged-in by Hcat at this level.

        Activity

        Hide
        Enis Soztutar added a comment -

        MetaStoreListener.onXXX() methods take an Event object to wrap the function call context, however HiveMetaHook functions does not. We can do either way, but the first one will introduce a lot of classes (50+) in case we do add corresponding methods.

        Since we want both preXXX() and onXXX() methods, we can do smt like:
        preXXX() takes normal function args + context (containing back pointer to handler)
        onXXX() takes normal function args + returned object + context (containing back pointer to handler + method status)

        Anyone care to comment ?

        Show
        Enis Soztutar added a comment - MetaStoreListener.onXXX() methods take an Event object to wrap the function call context, however HiveMetaHook functions does not. We can do either way, but the first one will introduce a lot of classes (50+) in case we do add corresponding methods. Since we want both preXXX() and onXXX() methods, we can do smt like: preXXX() takes normal function args + context (containing back pointer to handler) onXXX() takes normal function args + returned object + context (containing back pointer to handler + method status) Anyone care to comment ?
        Hide
        Phabricator added a comment -

        enis requested code review of "HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces".
        Reviewers: JIRA

        HIVE-2720. Merge MetaStoreListener and HiveMetaHook interfaces. First version of the patch

        MetaStoreListener and HiveMetaHook both serve as a notification mechanism for metastore-related events. The former is used by hcat and the latter is by the hbase-storage handler, and invoked by the client.
        I propose to merge these interfaces, and extend the MetaStoreListener, to add most of the on- and pre- methods at the Thrift interface. This way, extending metastore will be easier, and validation, storage-driver notification, and enforcement can be delegated to individual listeners. Besides, more functionality can be plugged-in by Hcat at this level.

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D1299

        AFFECTED FILES
        hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHookLoader.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListenerLoader.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/AddPartitionEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterPartitionEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterTableEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateDatabaseEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateTableEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/DropDatabaseEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/DropPartitionEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/DropTableEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/ListenerEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/LoadPartitionDoneEvent.java
        ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java
        ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
        ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java

        MANAGE HERALD DIFFERENTIAL RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/2703/

        Tip: use the X-Herald-Rules header to filter Herald messages in your client.

        Show
        Phabricator added a comment - enis requested code review of " HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces". Reviewers: JIRA HIVE-2720 . Merge MetaStoreListener and HiveMetaHook interfaces. First version of the patch MetaStoreListener and HiveMetaHook both serve as a notification mechanism for metastore-related events. The former is used by hcat and the latter is by the hbase-storage handler, and invoked by the client. I propose to merge these interfaces, and extend the MetaStoreListener, to add most of the on- and pre- methods at the Thrift interface. This way, extending metastore will be easier, and validation, storage-driver notification, and enforcement can be delegated to individual listeners. Besides, more functionality can be plugged-in by Hcat at this level. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D1299 AFFECTED FILES hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHookLoader.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListenerLoader.java metastore/src/java/org/apache/hadoop/hive/metastore/events/AddPartitionEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterPartitionEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterTableEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateDatabaseEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateTableEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/DropDatabaseEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/DropPartitionEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/DropTableEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/ListenerEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/LoadPartitionDoneEvent.java ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/2703/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
        Hide
        Phabricator added a comment -

        enis updated the revision "HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces".
        Reviewers: JIRA

        Fixed lint warning for long line.

        REVISION DETAIL
        https://reviews.facebook.net/D1299

        AFFECTED FILES
        hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHookLoader.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListenerLoader.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/AddPartitionEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterPartitionEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterTableEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateDatabaseEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateTableEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/DropDatabaseEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/DropPartitionEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/DropTableEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/ListenerEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/LoadPartitionDoneEvent.java
        ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java
        ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
        ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java

        Show
        Phabricator added a comment - enis updated the revision " HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces". Reviewers: JIRA Fixed lint warning for long line. REVISION DETAIL https://reviews.facebook.net/D1299 AFFECTED FILES hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHookLoader.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListenerLoader.java metastore/src/java/org/apache/hadoop/hive/metastore/events/AddPartitionEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterPartitionEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterTableEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateDatabaseEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateTableEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/DropDatabaseEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/DropPartitionEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/DropTableEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/ListenerEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/LoadPartitionDoneEvent.java ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
        Hide
        Phabricator added a comment -

        enis has commented on the revision "HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces".

        Some justifications on why we want to do the changes in this patch:

        • In the parent issue to this, https://issues.apache.org/jira/browse/HIVE-2241, we are introducing authorization checks in the metastore. With this patch, the authorization enforcer for the metastore can be implemented as a MetaStoreListener.
        • In a future issue (no jiras yet), we want to move HBase storage handler notifications from the client, to the metastore server. So that, the clients does not have to link hbase, and the metastore server can better enforce data sync.
        • again in a future issue (no jiras yet), we want to be optionally delegate ACL checks to the storage layer ( related: HIVE-1943 ), so with this patch, the storage handler can implement it's own authorization and validation checks.

        REVISION DETAIL
        https://reviews.facebook.net/D1299

        Show
        Phabricator added a comment - enis has commented on the revision " HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces". Some justifications on why we want to do the changes in this patch: In the parent issue to this, https://issues.apache.org/jira/browse/HIVE-2241 , we are introducing authorization checks in the metastore. With this patch, the authorization enforcer for the metastore can be implemented as a MetaStoreListener. In a future issue (no jiras yet), we want to move HBase storage handler notifications from the client, to the metastore server. So that, the clients does not have to link hbase, and the metastore server can better enforce data sync. again in a future issue (no jiras yet), we want to be optionally delegate ACL checks to the storage layer ( related: HIVE-1943 ), so with this patch, the storage handler can implement it's own authorization and validation checks. REVISION DETAIL https://reviews.facebook.net/D1299
        Hide
        Phabricator added a comment -

        enis has added reviewers to the revision "HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces".
        Added Reviewers: ashutoshc, heyongqiang

        REVISION DETAIL
        https://reviews.facebook.net/D1299

        Show
        Phabricator added a comment - enis has added reviewers to the revision " HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces". Added Reviewers: ashutoshc, heyongqiang REVISION DETAIL https://reviews.facebook.net/D1299
        Hide
        Phabricator added a comment -

        cwsteinbach has requested changes to the revision "HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces".

        INLINE COMMENTS
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java:105 Are the onXXXX methods called after a metastore operation has been successfully committed? If so should the name be changed to postXXX to make this clear? What happens if an onXXX method throws an exception?

        I assume that preXXX methods can block operations by throwing exceptions. If that's accurate this should be explained in the javadoc.

        Also, if these methods can do more than just listen, e.g. act, then I think the name of this interface needs to be changed to make that clear.

        ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java:83 Formatting. The same error is repeated elsewhere.
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java:199 A bunch of these methods don't appear to be hooked up yet.

        REVISION DETAIL
        https://reviews.facebook.net/D1299

        Show
        Phabricator added a comment - cwsteinbach has requested changes to the revision " HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces". INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java:105 Are the onXXXX methods called after a metastore operation has been successfully committed? If so should the name be changed to postXXX to make this clear? What happens if an onXXX method throws an exception? I assume that preXXX methods can block operations by throwing exceptions. If that's accurate this should be explained in the javadoc. Also, if these methods can do more than just listen, e.g. act, then I think the name of this interface needs to be changed to make that clear. ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java:83 Formatting. The same error is repeated elsewhere. metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java:199 A bunch of these methods don't appear to be hooked up yet. REVISION DETAIL https://reviews.facebook.net/D1299
        Hide
        Phabricator added a comment -

        enis updated the revision "HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces".
        Reviewers: JIRA, ashutoshc, heyongqiang, cwsteinbach

        Applied review suggestions by Carl Steinbach.

        • Renamesd onXXX methods to postXXX
        • Instead of using MetaStoreEventListener, we now extend HiveMetaHook.
        • Added hooks for all the methods in the metastore.
        • fixed formatting on some lines
        • Note that HiveMetaHook is an interface, and when we add new methods there,
          the current implementors(hcat) will not compile, unless they add those methods stubs.
          However, to keep backwords compatibility, the methods in HiveMetaHookBase, and
          MetaStoreEventListenerAdapter are overriden.
        • Tests are currently running.

        REVISION DETAIL
        https://reviews.facebook.net/D1299

        AFFECTED FILES
        common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        conf/hive-default.xml.template
        hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
        metastore/src/java/org/apache/hadoop/hive/metastore/AlterHandler.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHookBase.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListenerAdapter.java
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/AddPartitionEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterPartitionEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterTableEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateDatabaseEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateTableEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/DropDatabaseEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/DropPartitionEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/DropTableEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/ListenerEvent.java
        metastore/src/java/org/apache/hadoop/hive/metastore/events/LoadPartitionDoneEvent.java
        ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java

        Show
        Phabricator added a comment - enis updated the revision " HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces". Reviewers: JIRA, ashutoshc, heyongqiang, cwsteinbach Applied review suggestions by Carl Steinbach. Renamesd onXXX methods to postXXX Instead of using MetaStoreEventListener, we now extend HiveMetaHook. Added hooks for all the methods in the metastore. fixed formatting on some lines Note that HiveMetaHook is an interface, and when we add new methods there, the current implementors(hcat) will not compile, unless they add those methods stubs. However, to keep backwords compatibility, the methods in HiveMetaHookBase, and MetaStoreEventListenerAdapter are overriden. Tests are currently running. REVISION DETAIL https://reviews.facebook.net/D1299 AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java conf/hive-default.xml.template hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java metastore/src/java/org/apache/hadoop/hive/metastore/AlterHandler.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHookBase.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListenerAdapter.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java metastore/src/java/org/apache/hadoop/hive/metastore/events/AddPartitionEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterPartitionEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterTableEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateDatabaseEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/CreateTableEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/DropDatabaseEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/DropPartitionEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/DropTableEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/ListenerEvent.java metastore/src/java/org/apache/hadoop/hive/metastore/events/LoadPartitionDoneEvent.java ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java
        Hide
        Phabricator added a comment -

        cwsteinbach has requested changes to the revision "HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces".

        This also needs some test coverage. In particular I'd like to see testcases that cover the scenario where multiple hooks are registered.

        INLINE COMMENTS
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java:64 Right now pre-methods can also modify the input parameters before they are seen by the metastore. I don't think this should be allowed. We can prevent it by only passing the pre-methods copies of the input parameters, but that may be expensive. At the very least I think the javadoc needs to stipulate that pre-methods are not allowed to mutate the input parameters.
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java:54 "and for some of the of the operations"
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java:66 If multiple metahooks are registered, then it's also possible that the first metahook will raise an exception and the later metahooks won't see the event at all. This seems pretty bad from the standpoint of a person trying to write a audit plugin, since they risk not seeing a lot of operations that are failed by hooks that precede them in the list.

        Also, what happens if the second prehook throws an exception? Will the first posthook still get called with a failure? It's possible that the first prehook created state that will get cleared by the posthook, but this won't happen if the posthook is not called.

        I think the javadoc either needs to say that you (a) can't assume a posthook will get called just because the prehook was called, or else (b) say that the posthook will always get called if the prehook is called. Option (b) will make it much easier to write hooks at the expense of more complexity on the Hive side.
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java:65 What happens if a post-method throws an exception on an operation that has already been committed to the metastore? What does that exception mean, and how is Hive supposed to handle it?
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:883 Where's the prehook? I noticed missing prehooks elsewhere as well.
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:1287 This is a hack and is going to be broken by someone who doesn't understand that you're only calling get_database to trigger the hook.

        Also, the fact that pre/post hooks aren't symmetrical is going to cause a lot of grief for hook implementors. This needs to be fixed.

        This also introduces a time-of-check-to-time-of-use bug since you're not grabbing the DB and table definitions from within the same txn.
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:1458 Is this a separate bug? Is it being tracked in a different JIRA?
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:1624 This convention for triggering the hooks is not acceptable. I think code analysis tools will probably flag this since the db and tbl variables are not read.
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java:970 Implementations of what? Not sure if this method belongs here in the first place.
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java:42 What is the plan for removing this interface? Can we remove it on trunk as soon as HCatalog has migrated to the new interface?

        REVISION DETAIL
        https://reviews.facebook.net/D1299

        Show
        Phabricator added a comment - cwsteinbach has requested changes to the revision " HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces". This also needs some test coverage. In particular I'd like to see testcases that cover the scenario where multiple hooks are registered. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java:64 Right now pre-methods can also modify the input parameters before they are seen by the metastore. I don't think this should be allowed. We can prevent it by only passing the pre-methods copies of the input parameters, but that may be expensive. At the very least I think the javadoc needs to stipulate that pre-methods are not allowed to mutate the input parameters. metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java:54 "and for some of the of the operations" metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java:66 If multiple metahooks are registered, then it's also possible that the first metahook will raise an exception and the later metahooks won't see the event at all. This seems pretty bad from the standpoint of a person trying to write a audit plugin, since they risk not seeing a lot of operations that are failed by hooks that precede them in the list. Also, what happens if the second prehook throws an exception? Will the first posthook still get called with a failure? It's possible that the first prehook created state that will get cleared by the posthook, but this won't happen if the posthook is not called. I think the javadoc either needs to say that you (a) can't assume a posthook will get called just because the prehook was called, or else (b) say that the posthook will always get called if the prehook is called. Option (b) will make it much easier to write hooks at the expense of more complexity on the Hive side. metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java:65 What happens if a post-method throws an exception on an operation that has already been committed to the metastore? What does that exception mean, and how is Hive supposed to handle it? metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:883 Where's the prehook? I noticed missing prehooks elsewhere as well. metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:1287 This is a hack and is going to be broken by someone who doesn't understand that you're only calling get_database to trigger the hook. Also, the fact that pre/post hooks aren't symmetrical is going to cause a lot of grief for hook implementors. This needs to be fixed. This also introduces a time-of-check-to-time-of-use bug since you're not grabbing the DB and table definitions from within the same txn. metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:1458 Is this a separate bug? Is it being tracked in a different JIRA? metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:1624 This convention for triggering the hooks is not acceptable. I think code analysis tools will probably flag this since the db and tbl variables are not read. metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java:970 Implementations of what? Not sure if this method belongs here in the first place. metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java:42 What is the plan for removing this interface? Can we remove it on trunk as soon as HCatalog has migrated to the new interface? REVISION DETAIL https://reviews.facebook.net/D1299
        Hide
        Phabricator added a comment -

        cwsteinbach has commented on the revision "HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces".

        What effect will these changes have on StorageHandlers that were compiled against older versions of Hive. Will they still work, or will they have to be recompiled?

        REVISION DETAIL
        https://reviews.facebook.net/D1299

        Show
        Phabricator added a comment - cwsteinbach has commented on the revision " HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces". What effect will these changes have on StorageHandlers that were compiled against older versions of Hive. Will they still work, or will they have to be recompiled? REVISION DETAIL https://reviews.facebook.net/D1299
        Hide
        Phabricator added a comment -

        enis has commented on the revision "HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces".

        The patch as it is, does not change the event listener calling semantics or hivemetahook calling semantics. For something like an audit log, the order for the registered hooks become really important, and it should indeed be the first registered hook. That was one of the reasons that the HiveMetaStore.getDefaultHooks() precede all the user-defined hooks.

        As you suggested we can add transactional semantics to the hook interface, but I guess that would require exposing startTransaction, rollback and commit interfaces, and implementing something like a two phase commit to make sure to keep consistency with the external data. Without it, I don't think we can implement correct behavior.

        Having said that, I am very reluctant to go that way until some real need arises. So what I propose we can do is:

        • Keep the HiveMetaHook as it is in trunk and HiveMetaStoreClient.
        • introduce HiveExtendedMetaHook interface to limited private to replace HiveMetaStoreEventListener, and allow only hive and hcat implementations (by documenting it). We guarantee correct behavior by carefully ordering the hooks.
        • Keep the current semantics as in the patch, but improve on the javadoc for the method semantics.

        Would that work, what do you think?

        INLINE COMMENTS
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java:105 Renaming methods from onXXX -> postXX makes sense.

        The MetaStoreEventListener's can interact with the function execution by throwing exceptions, or can act differently on the outcome of the method call (like the HbaseStorageHandler). We can rename MetaStoreEventListener to MetaStoreHook, or else, we can re-purpose HiveMetaHook, and deprecate MetaStoreEventListener instead. Wdyt, any other suggestions?

        I'll update the patch shortly.
        metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java:199 I'll update the patch to add all the method hooks.
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java:66 The patch as it is, does not change the event listener calling semantics or hivemetahook calling semantics. For something like an audit log, the order for the registered hooks become really important, and it should indeed be the first registered hook. That was one of the reasons that the HiveMetaStore.getDefaultHooks() precede all the user-defined hooks.

        As you suggested we can add transactional semantics to the hook interface, but I guess that would require exposing startTransaction, rollback and commit interfaces, and implementing something like a two phase commit to make sure to keep consistency with the external data. Without it, I don't think we can implement correct behavior.

        Having said that, I am very reluctant to go that way until some real need arises. So what I propose we can do is:

        • Keep the HiveMetaHook as it is in trunk and HiveMetaStoreClient.
        • introduce HiveExtendedMetaHook interface to limited private to replace HiveMetaStoreEventListener, and allow only hive and hcat implementations (by documenting it). We guarantee correct behavior by carefully ordering the hooks.
        • Keep the current semantics as in the patch, but improve on the javadoc for the method semantics.

        Would that work, what do you think?

        REVISION DETAIL
        https://reviews.facebook.net/D1299

        Show
        Phabricator added a comment - enis has commented on the revision " HIVE-2720 [jira] Merge MetaStoreListener and HiveMetaHook interfaces". The patch as it is, does not change the event listener calling semantics or hivemetahook calling semantics. For something like an audit log, the order for the registered hooks become really important, and it should indeed be the first registered hook. That was one of the reasons that the HiveMetaStore.getDefaultHooks() precede all the user-defined hooks. As you suggested we can add transactional semantics to the hook interface, but I guess that would require exposing startTransaction, rollback and commit interfaces, and implementing something like a two phase commit to make sure to keep consistency with the external data. Without it, I don't think we can implement correct behavior. Having said that, I am very reluctant to go that way until some real need arises. So what I propose we can do is: Keep the HiveMetaHook as it is in trunk and HiveMetaStoreClient. introduce HiveExtendedMetaHook interface to limited private to replace HiveMetaStoreEventListener, and allow only hive and hcat implementations (by documenting it). We guarantee correct behavior by carefully ordering the hooks. Keep the current semantics as in the patch, but improve on the javadoc for the method semantics. Would that work, what do you think? INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java:105 Renaming methods from onXXX -> postXX makes sense. The MetaStoreEventListener's can interact with the function execution by throwing exceptions, or can act differently on the outcome of the method call (like the HbaseStorageHandler). We can rename MetaStoreEventListener to MetaStoreHook, or else, we can re-purpose HiveMetaHook, and deprecate MetaStoreEventListener instead. Wdyt, any other suggestions? I'll update the patch shortly. metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java:199 I'll update the patch to add all the method hooks. metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java:66 The patch as it is, does not change the event listener calling semantics or hivemetahook calling semantics. For something like an audit log, the order for the registered hooks become really important, and it should indeed be the first registered hook. That was one of the reasons that the HiveMetaStore.getDefaultHooks() precede all the user-defined hooks. As you suggested we can add transactional semantics to the hook interface, but I guess that would require exposing startTransaction, rollback and commit interfaces, and implementing something like a two phase commit to make sure to keep consistency with the external data. Without it, I don't think we can implement correct behavior. Having said that, I am very reluctant to go that way until some real need arises. So what I propose we can do is: Keep the HiveMetaHook as it is in trunk and HiveMetaStoreClient. introduce HiveExtendedMetaHook interface to limited private to replace HiveMetaStoreEventListener, and allow only hive and hcat implementations (by documenting it). We guarantee correct behavior by carefully ordering the hooks. Keep the current semantics as in the patch, but improve on the javadoc for the method semantics. Would that work, what do you think? REVISION DETAIL https://reviews.facebook.net/D1299
        Hide
        Carl Steinbach added a comment -

        The patch as it is, does not change the event listener calling semantics or hivemetahook calling semantics.

        Actually, I think this is a pretty significant change. HiveMetaStoreListener is purely a listener interface. All of the methods defined in the interface are triggered after the underlying metastore transaction has been committed or rolled back, which means that all of the methods are effectively "post" methods. Supporting more than one listener is uncomplicated because the listeners don't have the ability to affect the outcome of the operation.

        This patch replaces the listener interface with one that allows plugins to fail metastore operations by raising exceptions in pre-methods. I don't think this interface can support more than one plugin in a maintainable fashion for the reasons I mentioned earlier.

        introduce HiveExtendedMetaHook interface to limited private to replace HiveMetaStoreEventListener, and allow only hive and hcat implementations (by documenting it). We guarantee correct behavior by carefully ordering the hooks.

        I don't see any reason to remove HiveMetaStoreListener. It's useful in its current form for writing audit and logging plugins, the implementation is good, and people are probably already using it since it appeared in the 0.8.0 release.

        As for the proposed HiveExtendedMetaHook interface, in practice I don't think you will be able to guarantee correct behavior by ordering the hooks, and making this a requirement places a significant burden on hook implementors.

        Show
        Carl Steinbach added a comment - The patch as it is, does not change the event listener calling semantics or hivemetahook calling semantics. Actually, I think this is a pretty significant change. HiveMetaStoreListener is purely a listener interface. All of the methods defined in the interface are triggered after the underlying metastore transaction has been committed or rolled back, which means that all of the methods are effectively "post" methods. Supporting more than one listener is uncomplicated because the listeners don't have the ability to affect the outcome of the operation. This patch replaces the listener interface with one that allows plugins to fail metastore operations by raising exceptions in pre-methods. I don't think this interface can support more than one plugin in a maintainable fashion for the reasons I mentioned earlier. introduce HiveExtendedMetaHook interface to limited private to replace HiveMetaStoreEventListener, and allow only hive and hcat implementations (by documenting it). We guarantee correct behavior by carefully ordering the hooks. I don't see any reason to remove HiveMetaStoreListener. It's useful in its current form for writing audit and logging plugins, the implementation is good, and people are probably already using it since it appeared in the 0.8.0 release. As for the proposed HiveExtendedMetaHook interface, in practice I don't think you will be able to guarantee correct behavior by ordering the hooks, and making this a requirement places a significant burden on hook implementors.

          People

          • Assignee:
            Enis Soztutar
            Reporter:
            Enis Soztutar
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development