Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10555

Add INotifyDStream to Spark Streaming

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • DStreams

    Description

      Currently, spark streaming has support for fileStreams, and while this is super useful in general, it has its limitations - such as only being able to process new files under each folder.

      There are certain use cases (such as monitoring a root folder for incoming data, and registering the files into HIVE & performing file level replication across HDFS clusters) where taking actions based on multi level nested uploads is useful.

      We have a POC version of INotifyDStream that we are currently using in Staging environment at Uber. Would love to contribute that back, if it makes sense for Spark.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vinothchandar Vinoth Chandar
              Votes:
              11 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: