Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7973

Hive Replication Support

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Import/Export
    • None
    • replication

    Description

      A need for replication is a common one in many database management systems, and it's important for hive to evolve support for such a tool as part of its ecosystem. Hive already supports an EXPORT and IMPORT command, which can be used to dump out tables, distcp them to another cluster, and and import/create from that. If we had a mechanism by which exports and imports could be automated, it establishes the base with which replication can be developed.

      One place where this kind of automation can be developed is with aid of the HiveMetaStoreEventHandler mechanisms, to generate notifications when certain changes are committed to the metastore, and then translate those notifications to export actions, distcp actions and import actions on another import action.

      Part of that already exists is with the Notification system that is part of hcatalog-server-extensions. Initially, this was developed to be able to trigger a JMS notification, which an Oozie workflow can use to can start off actions keyed on the finishing of a job that used HCatalog to write to a table. While this currently lives under hcatalog, the primary reason for its existence has a scope well past hcatalog alone, and can be used as-is without the use of HCatalog IF/OF. This can be extended, with the help of a library which does that aforementioned translation. I also think that these sections should live in a core hive module, rather than being tucked away inside hcatalog.

      Once we have rudimentary support for table & partition replication, we can then move on to further requirements of replication, such as metadata replications (such as replication of changes to roles/etc), and/or optimize away the requirement to distcp and use webhdfs instead, etc.

      This Story tracks all the bits that go into development of such a system - I'll create multiple smaller tasks inside this as we go on.

      Please also see HIVE-10264 for documentation-related links for this, and https://cwiki.apache.org/confluence/display/Hive/HiveReplicationDevelopment for associated wiki (currently in progress)

      Attachments

        Issue Links

          1.
          Notification Event Listener movement to a new top level repl/ module Sub-task Resolved Sushanth Sowmyan
          2.
          Adding in a ReplicationTask that converts a Notification Event to actionable tasks Sub-task Closed Sushanth Sowmyan
          3.
          Annotation changes for replication Sub-task Closed Sushanth Sowmyan
          4.
          Enable queuing of HCatalog notification events in metastore DB Sub-task Resolved Alan Gates
          5.
          Notification message size can be arbitrarily long, DbNotificationListener limits to 1024 Sub-task Resolved Alan Gates
          6.
          Add alters to list of events handled by NotificationListener Sub-task Resolved Alan Gates
          7.
          Modify HCatClient to support new notification methods in HiveMetaStoreClient Sub-task Resolved Alan Gates
          8.
          Add ability for client to request metastore to fire an event Sub-task Closed Alan Gates
          9.
          Fire insert event on HCatalog appends Sub-task Open Alan Gates
          10.
          Add option to fire metastore event on insert Sub-task Closed Alan Gates
          11.
          DbNotificationListener doesn't include dbname in create database notification and does not include tablename in create table notification Sub-task Closed Alan Gates
          12.
          ObjectStore.getNextNotification() can return events inside NotificationEventResponse as null which conflicts with its thrift "required" tag Sub-task Closed Sushanth Sowmyan
          13.
          Concrete implementation of Export/Import based ReplicationTaskFactory Sub-task Closed Sushanth Sowmyan
          14.
          Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics Sub-task Closed Sushanth Sowmyan
          15.
          Document Replication support on wiki Sub-task Resolved Shannon Ladymon
          16.
          AlterPartitionMessage should return getKeyValues instead of getValues Sub-task Closed Sushanth Sowmyan
          17.
          Allow pushing of property-key-value based predicate filter to Metastore dropPartitions Sub-task Open Unassigned
          18.
          Rework/simplify ReplicationTaskFactory instantiation Sub-task Closed Sushanth Sowmyan
          19.
          Add Event Nullification support for Replication Sub-task Open Unassigned
          20.
          Need to add schema upgrade changes for queueing events in the database Sub-task Resolved Alan Gates

          Activity

            People

              sushanth Sushanth Sowmyan
              sushanth Sushanth Sowmyan
              Votes:
              2 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated: