Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6609

Fix multi-writer with deltastreamer checkpointing

    XMLWordPrintableJSON

Details

    Description

      As of now, we store checkpoints in commit metadata while writing via deltastreamer.
       
      To support multiple writers (multiple deltastreamers), we added support sometime back where in the checkpoint will be a map to store multiple entries with key referring to writer identifier.
       

      { \{ "writer1" = "checkpointVal1"}

      ,

      { "writer2" = "checkpointVal2"}

      }
       
      But this incurs some additional locking since everytime when new checkpoint needs to be updated, we have to reload the timeline and fetch the latest known commit metadata.
       
      Instead we can de-couple the checkpoint.
      Each writer only update its own checkpoint. and while parsing/fetching the latest known checkpoint for a writer, we might need to walk back in the timeline and find the right checkpoint value.
       
      For eg:
      commit1 by writer1: commit metadata ➝ {writer1 = checkpointVal1}
      commit2 by writer2: commit metadata ➝ {writer2 = checkpointValA}
       
      commit2 by writer1: To fetch latest checkpoint for writer1, we walk back the timeline and fetch the checkpoint of interest. So, even though latest commit metadata might have checkpoint, its key refers to writer2. And so we might need to go back and fetch the checkpoint from commit1.
      and finally writer1 will update the commit metadata to {writer1 = checkpointVal2}
       
       
      btw, Please check when was the multiple checkpoint support was added. if it was added before 0.13.0, we need to ensure its backwards compatible as well. if not, we are good. Just fixing the exiting solution would suffice.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              shivnarayan sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: