Hive
  1. Hive
  2. HIVE-1881

Make the MetaStore filesystem interface pluggable via the hive.metastore.fs.handler.class configuration property

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: Metastore
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      >>@Yongqiang: What's the motivation for doing this?
      This is to work with some internal hacky codes about doing delete. There should be no impact if you use open source hadoop.

      But the idea here is to give users 2 options to do the delete. In Facebook, we have some customized code in FsShell which can control whether the delete should go through trash or not.

      1. HIVE-1881.3.patch
        8 kB
        He Yongqiang
      2. HIVE-1881.2.patch
        7 kB
        He Yongqiang
      3. HIVE-1881.1.patch
        8 kB
        He Yongqiang

        Issue Links

          Activity

          Hide
          Carl Steinbach added a comment -

          The new configuration property "hive.metastore.fs.handler.class" needs to get added to hive-default.xml
          along with a short description explaining what it does.

          The other problem I see is that the MetaStoreFS interface that this patch added is incomplete –
          it only defines a deleteDir() method. What about createDir(), isDir(), etc? It seems like any MetaStore
          code that directly references Hadoop's FileSystem class needs to get moved into HiveMetaStoreFsImpl.

          I don't think this needs to be done right away, but the problem with waiting is that any changes to MetaStoreFS
          will break classes that implement older versions of that interface. Since MetaStoreFS is a public API we
          need to get this done before it appears in an official release.

          Show
          Carl Steinbach added a comment - The new configuration property "hive.metastore.fs.handler.class" needs to get added to hive-default.xml along with a short description explaining what it does. The other problem I see is that the MetaStoreFS interface that this patch added is incomplete – it only defines a deleteDir() method. What about createDir(), isDir(), etc? It seems like any MetaStore code that directly references Hadoop's FileSystem class needs to get moved into HiveMetaStoreFsImpl. I don't think this needs to be done right away, but the problem with waiting is that any changes to MetaStoreFS will break classes that implement older versions of that interface. Since MetaStoreFS is a public API we need to get this done before it appears in an official release.
          Hide
          Namit Jain added a comment -

          Committed. Thanks Yongqiang

          Show
          Namit Jain added a comment - Committed. Thanks Yongqiang
          Hide
          Namit Jain added a comment -

          +1

          Show
          Namit Jain added a comment - +1
          Hide
          Namit Jain added a comment -

          We still need to create an interface, which can have a different implementation in facebook

          Show
          Namit Jain added a comment - We still need to create an interface, which can have a different implementation in facebook
          Hide
          Edward Capriolo added a comment -

          Then can we CLOSE with WONT FIX ?

          Show
          Edward Capriolo added a comment - Then can we CLOSE with WONT FIX ?
          Hide
          Namit Jain added a comment -

          Talking offline with Yongqiang, the facebook specific implementation of this interface need not be checked in
          open source, nor is there any need to document the new configuration parameter in open source, since this
          parameter only makes sense in the facebook enviroment

          Show
          Namit Jain added a comment - Talking offline with Yongqiang, the facebook specific implementation of this interface need not be checked in open source, nor is there any need to document the new configuration parameter in open source, since this parameter only makes sense in the facebook enviroment
          Hide
          Namit Jain added a comment -

          Seems like the correct thing to do

          Show
          Namit Jain added a comment - Seems like the correct thing to do
          Hide
          Carl Steinbach added a comment -

          I'm concerned that this patch introduces two new configuration properties that don't make sense
          to anyone outside of Facebook. I think we need to avoid doing this since it makes the configuration
          process more complicated (it's already complicated enough), and also introduces an untested
          code path.

          Instead, I'd like to propose that we define a MetaStoreFs interface that defines createDir and
          deleteDir methods, etc, along with a default implementation and the ability to plug in other
          implementations by setting a new hive.metastore.fs.impl configuration property.

          What do you think?

          Show
          Carl Steinbach added a comment - I'm concerned that this patch introduces two new configuration properties that don't make sense to anyone outside of Facebook. I think we need to avoid doing this since it makes the configuration process more complicated (it's already complicated enough), and also introduces an untested code path. Instead, I'd like to propose that we define a MetaStoreFs interface that defines createDir and deleteDir methods, etc, along with a default implementation and the ability to plug in other implementations by setting a new hive.metastore.fs.impl configuration property. What do you think?
          Hide
          He Yongqiang added a comment -

          I don't understand the motivation for this change, but assuming that
          FsShell provides features unavailable in FileSystem, is there any reason
          why we can't replace the FileSystem based implementation with the new
          one that uses FsShell?

          Yeah, we can replace it completely. But there is an overhead of using FsShell since it requires a new process. We just want to go to the new code path only needed. For normal ones, just keep the old behavior.

          I think you can simplify the interface by making getSessionConfStore() private,
          and then calling it from getConf() and setConf() which can now be made static. Then
          you'll be able to call

          SessionConfStore.getConf()

          instead of

          SessionConfStore.getSessionConfStore().getConf()

          will do it and upload a new patch.

          Thanks!

          Show
          He Yongqiang added a comment - I don't understand the motivation for this change, but assuming that FsShell provides features unavailable in FileSystem, is there any reason why we can't replace the FileSystem based implementation with the new one that uses FsShell? Yeah, we can replace it completely. But there is an overhead of using FsShell since it requires a new process. We just want to go to the new code path only needed. For normal ones, just keep the old behavior. I think you can simplify the interface by making getSessionConfStore() private, and then calling it from getConf() and setConf() which can now be made static. Then you'll be able to call SessionConfStore.getConf() instead of SessionConfStore.getSessionConfStore().getConf() will do it and upload a new patch. Thanks!
          Hide
          Carl Steinbach added a comment -
          Show
          Carl Steinbach added a comment - Review posted here: https://reviews.apache.org/r/210/

            People

            • Assignee:
              He Yongqiang
              Reporter:
              He Yongqiang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development