Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10555

Add INotifyDStream to Spark Streaming

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: DStreams
    • Labels:

      Description

      Currently, spark streaming has support for fileStreams, and while this is super useful in general, it has its limitations - such as only being able to process new files under each folder.

      There are certain use cases (such as monitoring a root folder for incoming data, and registering the files into HIVE & performing file level replication across HDFS clusters) where taking actions based on multi level nested uploads is useful.

      We have a POC version of INotifyDStream that we are currently using in Staging environment at Uber. Would love to contribute that back, if it makes sense for Spark.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                vinothchandar Vinoth Chandar
              • Votes:
                11 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: