Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21761 Support table level replication in Hive
  3. HIVE-21763

Incremental replication to allow changing include/exclude tables list in replication policy.

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      • REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
        - REPL DUMP <current_repl_policy> [REPLACE <previous_repl_policy> FROM <last_repl_id> WITH <key_values_list>;
        - current_repl_policy and previous_repl_policy can be any format mentioned in Point-4.
        - REPLACE clause to be supported to take previous repl policy as input. If REPLACE clause is not there, then the policy remains unchanged.
        - Rest of the format remains same.
        
      • Now, REPL DUMP on this DB will replicate the tables based on current_repl_policy.
      • Single table replication of format <db_name>.t1 doesn’t allow changing the policy dynamically. So REPLACE clause is not allowed if previous_repl_policy of this format.
      • If any table is added dynamically either due to change in regular expression or added to include list should be bootstrapped using independant table level replication policy.
        - Hive will automatically figure out the list of tables newly included in the list by comparing the current_repl_policy & previous_repl_policy inputs and combine bootstrap dump for added tables as part of incremental dump. "_bootstrap" directory can be created in dump dir to accommodate all tables to be bootstrapped.
        - If any table is renamed, then it may gets dynamically added/removed for replication based on defined replication policy + include/exclude list. So, Hive will perform bootstrap for the table which is just included after rename.
        
      • REPL LOAD should check for changes in repl policy and drop the tables/views excluded in the new policy compared to previous policy. It should be done before performing incremental and bootstrap load from the current dump.
      • REPL LOAD on incremental dump should load events directories first and then check for "_bootstrap" directory and perform bootstrap load on them.

      Rename table is not in scope of this jira.

      Attachments

        1. HIVE-21763.03.patch
          69 kB
          Sankar Hariappan
        2. HIVE-21763.02.patch
          63 kB
          Sankar Hariappan
        3. HIVE-21763.01.patch
          55 kB
          Sankar Hariappan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sankarh Sankar Hariappan Assign to me
            sankarh Sankar Hariappan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 2.5h
              2.5h

              Slack

                Issue deployment