Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5077

Supporting multiple deltastreamers writing to a single hudi table

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • deltastreamer
    • None

    Description

      As of now, we can only have a single deltastreamer write to a single hudi table. we have an ask from the community to have 2 deltastreamers write to a single table. 

       

      Things required to be fixed:

      1. we need to fix the checkpointing to have multiple key-value pairs, where key represents a unique identifier for the deltastreamer client and value represents the checkpoint. We might need to introduce a new notion of identifier for each deltastreamer in this case.
      2. within delta sync, after writeClient.upsert, before calling writeClient.commit, we need to update the checkpoint value. for this, we might need to take a lock and then fetch latest checkpoint from timeline (since there could be multiple wirters) and then update the checkpoint. and release the lock. 

       

      These are the changes I can think of. may be while implementing it, there could be some more minor fixes required. 

       

      ask from a user: https://github.com/apache/hudi/issues/6718

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              shivnarayan sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: