Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1813

Hive should be able to run on multiple data centers

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Currently, hive assumes a single metastore and the HADOOP_HOME is passed as a environment variable.

      It would be desirable to support hive on top of multiple data centers (dfs + mr).

      For eg. there could be 2 metastores: primary and secondary. They would have different dfs's , and there will be a
      dfs->mr mapping maintained by the metastore.

      Hive would be enhanced to support multiple metastores and all operations (ddl + query) would span multiple metastores.

      Different consistency pluggable policies can be employed - for eg. if a table/partition can be present in both the metastores with different
      last modification times, either the last one can be used or an error can be thrown.

      It will be upto the application (outside hive) to copy the data from one metastore to another, and to maintain consistency inside.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              namit Namit Jain
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: