Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21761 Support table level replication in Hive
  3. HIVE-21763

Incremental replication to allow changing include/exclude tables list in replication policy.

    XMLWordPrintableJSON

Details

    Description

      • REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
        - REPL DUMP <current_repl_policy> [REPLACE <previous_repl_policy> FROM <last_repl_id> WITH <key_values_list>;
        - current_repl_policy and previous_repl_policy can be any format mentioned in Point-4.
        - REPLACE clause to be supported to take previous repl policy as input. If REPLACE clause is not there, then the policy remains unchanged.
        - Rest of the format remains same.
        
      • Now, REPL DUMP on this DB will replicate the tables based on current_repl_policy.
      • Single table replication of format <db_name>.t1 doesn’t allow changing the policy dynamically. So REPLACE clause is not allowed if previous_repl_policy of this format.
      • If any table is added dynamically either due to change in regular expression or added to include list should be bootstrapped using independant table level replication policy.
        - Hive will automatically figure out the list of tables newly included in the list by comparing the current_repl_policy & previous_repl_policy inputs and combine bootstrap dump for added tables as part of incremental dump. "_bootstrap" directory can be created in dump dir to accommodate all tables to be bootstrapped.
        - If any table is renamed, then it may gets dynamically added/removed for replication based on defined replication policy + include/exclude list. So, Hive will perform bootstrap for the table which is just included after rename.
        
      • REPL LOAD should check for changes in repl policy and drop the tables/views excluded in the new policy compared to previous policy. It should be done before performing incremental and bootstrap load from the current dump.
      • REPL LOAD on incremental dump should load events directories first and then check for "_bootstrap" directory and perform bootstrap load on them.

      Rename table is not in scope of this jira.

      Attachments

        1. HIVE-21763.01.patch
          55 kB
          Sankar Hariappan
        2. HIVE-21763.02.patch
          63 kB
          Sankar Hariappan
        3. HIVE-21763.03.patch
          69 kB
          Sankar Hariappan

        Issue Links

          Activity

            People

              sankarh Sankar Hariappan
              sankarh Sankar Hariappan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h