Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1723

DFSPathSelector skips files with the same modify date when read up to source limit

    XMLWordPrintableJSON

    Details

      Description

      org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles filters the input files based on last saved checkpoint, which was the modification date from last read file. However, the last read file's modification date could be duplicated for multiple files and resulted in skipping a few of them when reading up to source limit. An illustration is shown in the attached picture.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                xushiyan Raymond Xu
                Reporter:
                xushiyan Raymond Xu
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: