Currenly, spark streaming can monitor a directory and it will process the newly added files. This will cause a bug if the files copied to the directory are big. For example, in hdfs, if a file is being copied, its name is file_name.COPYING. Spark will pick up the file and process. However, when it's done copying the file, the file name becomes file_name. This would cause FileDoesNotExist error. It would be great if we can exclude files using regex in the directory.