Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
Currently, spark streaming has support for fileStreams, and while this is super useful in general, it has its limitations - such as only being able to process new files under each folder.
There are certain use cases (such as monitoring a root folder for incoming data, and registering the files into HIVE & performing file level replication across HDFS clusters) where taking actions based on multi level nested uploads is useful.
We have a POC version of INotifyDStream that we are currently using in Staging environment at Uber. Would love to contribute that back, if it makes sense for Spark.
Attachments
Issue Links
- relates to
-
HDFS-6634 inotify in HDFS
- Closed
- links to