Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1896

HudiStreamer Source for cloud object stores

    XMLWordPrintableJSON

Details

    • Implement DeltaStreamer Source for cloud object stores

    Description

      As discussed in HUDI-1723, we need a better implementation for Cloud object storage like AWS S3 or GCS, leveraging on change notification.

      Also consider https://docs.databricks.com/spark/latest/structured-streaming/sqs.html

       

      We need to look into current *DFSSource classes and see if we can add a new `DFSPathSelector` implementation, that fetech new files on cloud storage after a given point in time. The timestamp based approach used by existing path selector, largely works, but has corner cases as mentioned in HUDI-1723 

      Attachments

        Issue Links

          Activity

            People

              rmahindra Rajesh Mahindra
              xushiyan Shiyan Xu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: