Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5647

Automate savepoint and restore tests

    XMLWordPrintableJSON

Details

    • 4

    Description

      Automate savepoint and restore tests

      Scenarios to cover:

       

      All tests to be done for
      w/ and w/o metadata
      partitioned and non-partitioned dataset. 

      COW

      Format:
      scenario being tested
      timeline 
      what to expect after restore. 

      1. straight forward
      C1, C2, savepoint C2. C3, C4, restore. 
      should go back to C2. 
      C3, C4 should be cleaned up. 

      2. pending inflight. 
      C1, C2, savepoint C2. C3, C4 inflight. restore. 
      should go back to C2. 
      C3, C4 should be cleaned up. 

      3. completed rollbacks in timeline. 
      C1, C2, savepoint C2, C3, C4 (RB_C3), C5. restore. 
      should go back to C2. 
      C3, C4(RB_C3), C5 should be cleaned up. 

      4. pending rollbacks after savepoint. 

      C1, C2, savepoint C2, C3, C4 (RB_C3) inflight. restore. 
      should go back to C2. 
      C3, C4 (RB_C3) should be cleaned up. 

      5. clean commits after savepoint. 
      C1, C2, savepoint C2, C3, C4, C5 (clean C1), C6, restore
      should go back to C2. 
      C3, C4, C5 (clean C1), C6 should be cleaned up.

      6. clustering. 
      C1, C2, savepoint C2. C3, C4.replace commit, C5, restore. 
      should go back to C2. 
      C3, C4.replace commit, C5 should be cleaned up. 

      7. pending clustering after savepoint. 
      C1, C2, savepoint C2. C3, C4.replace commit.inflight, C5, restore. 
      should go back to C2.
      C3, C4.replace commit files and C5 files should be cleaned up. 

      8. completed clustering before savepoint. 
      C1, C2, C3.replacecommit.complete, C4, savepoint C4, C5, restore. 
      should go back to C4.
      C5 should be cleaned up. 

      9. pending clustering before savepoint. 
      C1, C2, C3.replace commit.inflight, C3, C4, savepoint C4, C5, restore 
      should go back to C4. 
      C4 should be cleaned up. if pipeline is restarted, C3.replace commit should be re-attempted. 

      MOR 

      1. simple one
      DC1, DC2, DC3, savepoint DC3. DC4, DC5. restore
      should rollback DC4 and DC5 
      No files will be cleaned up. only rollback log appends. 

      2. simple one w/ compaction. 
      DC1, DC2, DC3, C4, savepoint C4. DC5, DC6. restore
      should rollback DC5 and DC6 
      No files will be cleaned up. only rollback log appends. 

      3. another one w/ compaction. 
      DC1, DC2, DC3, savepoint DC3, DC4, C5, DC6, DC7. restore
      should rollback DC5 and DC6. 
      latest file slice should be fully cleaned up. and rollback log appends for DC4 in first file slice. 

      4. compaction and clean commits. 
      DC1, DC2, DC3, savepoint DC3, DC4, C5, DC6, DC7, DC8, C9, C10.clean, DC11, DC12 restore. 
      should take the table back to DC3. 
      Cleaner should not have cleaned up file slice 1 since it was part of savepoint. Entire file slice 2 and 3 should be cleaned up. 
      i.e. C5, DC6, DC7, DC8, C9, C10.clean, DC11, DC12. and a rollback log append for DC4. 

      5. pending compaction after savepoint. 
      DC1, DC2, DC3, savepoint DC3, DC4, C5.pending. DC6, DC7. restore
      should rollback until DC3. 
      latest file slice should be fully delete. for DC4 a rollback log append should be made. 

      6. pending compaction before savepoint. 
      DC1, DC2, DC3, C4.pending, DC5, savepoint DC5, DC6, DC7. restore
      should rollback until DC5. 
      rollback log appends for DC6 and DC7. 

      7. compaction and clustering. completed clustering before savepoint. 
      DC1, DC2, DC3, C4, DC5, C6.replacecommit.completed. DC7, savepoint DC7, DC8, DC9. restore
      inpsect what C6 does. likely it will create a new file group. and then start taking in DC7. 
      should take the table back to DC7. 
      rollback log appends for DC8 and DC9. 

      8. compaction and clustering. completed clustering after savepoint. 
      DC1, DC2, DC3, C4, DC5, savepoint DC5, C6.replacecommit.completed, DC7, DC8, restore
      inpsect what C6 does. likely it will create a new file group. and then start taking in DC7. 
      should take the table back to DC5. 
      latest file slice created by C6 should be cleaned up fully. 

      9. pending clustering before savepoint. 
      DC1, DC2, DC3, C4, DC5, C6.replacecommit.inflight. DC7, savepoint DC7, DC8, DC9. restore
      should take the table back to DC7. 
      rollback log appends for DC8 and DC9. when pipeline is restarted, C6 should be re-attempted and get to completion. 

      10. pending clustering after savepoint. 
      DC1, DC2, DC3, C4, DC5, savepoint DC5, C6.replacecommit.inflight, DC7, DC8, restore
      should take the table back to DC5. 
      latest file slice created by C6 should be cleaned up fully. 

      11. completed rollbacks after savepoint. 
      DC1, DC2, DC3, C4, savepoint C4. DC5, C6(RB_DC5), DC7. restore
      should rollback DC5, C6 and DC6. 
      No files will be cleaned up. only rollback log appends. 

       

      Few more cases to test:

       

      case 1:
      rolling back a commit thats already cleaned up: 
      C1, C2, C3, C4, SP_C4, C5, C6, C7, C8, cleaner_C9 (cleaned up C1, C2, C3, C5), C10, restore. 

      case 2: 
      inflight clean after savepoint which is supposed to clean up files pertaining to a commit that will be rolled back by restore. 
      C1, C2, C3, C4, SP_C4, C5, C6, C7, C8, cleaner_C9.inflight (cleaned up C1, C2, C3, C5), C10, restore. 

      after restore:
      C1, C2, C3, C4, SP_C4, cleaner_C9.inflight 
      at some point, cleaner will retry. 

      Fix: restore should first finish any pending clean after savepoint and then start the restore. 

       

      More cases:

      12:
      rolling back a commit thats already cleaned up: 
      C1, C2, C3, C4, SP_C4, C5, C6, C7, C8, cleaner_C9 (cleaned up C1, C2, C3, C5), C10, restore. 

      13: 
      inflight clean after savepoint which is supposed to clean up files pertaining to a commit that will be rolled back by restore. 
      C1, C2, C3, C4, SP_C4, C5, C6, C7, C8, cleaner_C9.inflight (cleaned up C1, C2, C3, C5), C10, restore. 

      after restore:
      C1, C2, C3, C4, SP_C4, cleaner_C9.inflight 
      at some point, cleaner will retry. 

      When cleaner retries, it does succeed w/o any issues. 

       

      Attachments

        Issue Links

          Activity

            People

              danny0405 Danny Chen
              shivnarayan sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: