Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5176

Incremental source may miss commits if there are inflight commits before completed commits

    XMLWordPrintableJSON

Details

    • 2

    Description

      Consider the following scenario of concurrent writers. Writer 1 starts a commit at t1 and later writer 2 starts another commit at t2 (t2 > t1). Commit t2 finishes earlier than t1.

      ---------------------------------------------------------> t
       instant t1 |------------------------------| (writer 1)
       instant t2         |--------------|         (writer 2) 

      This leaves an inflight commit (t1) before a completed commit (t2) on the Hudi timeline.  Given that the incremental pull uses only completed commits to determine the start and end instants for incremental query and advance the checkpoint, the data for the inflight commits may never be pulled from the incremental source.

       

      Attachments

        Issue Links

          Activity

            People

              guoyihua Ethan Guo
              guoyihua Ethan Guo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: