Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18320 Support ACID Tables Replication
  3. HIVE-19927

Last Repl ID set by bootstrap dump is incorrect and may cause data loss if have ACID/MM tables.

    XMLWordPrintableJSON

Details

    Description

      During bootstrap dump of ACID tables, let's consider the below sequence.

      • Current session (REPL DUMP), Open txn (Txn1) - Event-10
      • Another session (Session-2), Open txn (Txn2) - Event-11
      • Session-2 -> Insert data (T1.D1) to ACID table. - Event-12
      • Get lastReplId = last event ID logged. (Event-12)
      • Session-2 -> Commit Txn (Txn2) - Event-13
      • Dump ACID tables based on validTxnList based on Txn1. --> This step skips all the data written by txns > Txn1. So, T1.D1 will be missing.
      • Commit Txn (Txn1)
      • REPL LOAD from bootstrap dump will skip T1.D1.
      • Incremental REPL DUMP will start from Event-13 and hence lose Txn2 which is opened after Txn1. So, data T1.D1 will be lost for ever.

      Proposed to capture the lastReplId of bootstrap before opening current txn (Txn1) and store it in Driver context and use it for dump.

      Attachments

        1. HIVE-19927.01.patch
          22 kB
          Sankar Hariappan
        2. HIVE-19927.01-branch-3.patch
          27 kB
          Sankar Hariappan
        3. HIVE-19927.02.patch
          26 kB
          Sankar Hariappan
        4. HIVE-19927.03.patch
          26 kB
          Sankar Hariappan
        5. HIVE-19927.04.patch
          26 kB
          Sankar Hariappan

        Issue Links

          Activity

            People

              sankarh Sankar Hariappan
              sankarh Sankar Hariappan
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: