Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6738

Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • None

    Description

      Recent refactoring to support batching within commit for GCS incr job moved the filtering of  objects after the checkpoint batching. The issue with this on bootstrap scenarios where we are looking for only latest commits, we will have to go through the entire set of commits based on sourcelimit instead of directly skipping to the latest commit. 

      Fix is to apply filtering before we start checkpoint batching. This change list will bring GCS job similar to S3 job. 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              linlok Lokesh Lingarajan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: