Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2432

Fix restore by adding a requested instant and restore plan

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.11.0
    • None

    Description

      Fix restore by adding a requested instant and restore plan

       

      Trying to see if we really need a plan. Dumping my thoughts here. 

      Restore internally converts to N no of rollbacks. We fetch active instants in reverse order from timeline and trigger rollbacks 1 by 1. We have already have a patch fixing rollback to add rollback Plan in rollback.requested meta file. So, walking through failure scenarios. 

       

      With restore, individual rollbacks are not published to timeline. So, if restore fails midway, in the 2nd attempt, only subset of rollback will be applied to metadata table(which got rolledback during the 2nd attempt). so, we need a plan for restore as well.

      But with our enhancement to rollback to publish a plan, Rollback.requested can't be skipped and we have to publish to timeline. So, here is what will happen w/o a restore plan.

       

      start restore

          rollback commit N

                rollback.requested for commit N// plan.

                execute rollback, but do not publish to timeline. so this will not get applied to metadata table. 

          rollback commit N-1

                 rollback.requested for commit N-1 // plan

                execute rollback, but do not publish to timeline. again, will not get applied to metadata table. 

           .

      commit restore and publish. this will get applied to metadata table. 

      Once we are done committing restore, we can remove all rollback.requested files if needed. 

       

      Failure scenarios: 

      If after 2 rollbacks, we fail. 

      on re-attempt, we will process remaining commits only, since active timeline may not report commitN and commitN-1 as active. So, we can do something like below w/ a restore plan.

       

      1. start restore

         2. schedule rollback for all of them. 

              serialize all commit instants that need to be rolledback along with the rollback plan. // by now, we would have created rollback.requested meta file for all commits that need to be rolled back. 

          3. now execute rollback one by one. // do not publish to timeline once done. also changes should not be applied to metadata table. 

      4. collect rollback commit metadata from all individual rollbacks and create the restore commit metadata. there could be some commits which was already rolledback, and for those, we need to manually create rollback metadata based on rollback plan. More details in next para. commit the restore and publish. only this will get applied to metadata table(which inturn will unwrap the individual rollback metadata and apply it to metadata table). 

       

      Failures:

      if we fail after 2nd rollback:

      on 2nd attempt, we will look at retstore plan for all commits that needs to be rolledback. So, we can't really rollback the first 2 since they are already rolled back. And so, we will manually create rollback metadata from rollback.requested meta file. and for rest, we will follow the regular flow of executing actual rollback and collecting rollback metadata. Once complete, we will serialize all this info in restore metadata which gets applied to metadata table. 

       

      Alternatives: But since restore anyway is a destructive operation and is advised to stop all processes, we do have an option to clean up metadata table and rebootstrap completely once restore is complete. 

       

       

       

      Attachments

        Issue Links

          Activity

            People

              shivnarayan sivabalan narayanan
              shivnarayan sivabalan narayanan
              Vinoth Chandar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: