Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
It would be nice to be able to configure a staging directory for files being written to HDFS. Once the file stream is complete the file would then be moved to the configured "final" directory.
One example use case where this helps is with log files which are being analyzed by Hive. We could have a Hive table that points to HDFS folder which contains a bunch of log files. As it stands, if flume is writing a tmp file into that directory, and you fire up a MapReduce job, and that file is finished being written to (thus changing the filename) than the job will fail because it can't find that job.
The current workaround is to use virtual columns to not look at TMP files, but this tedious to do for every query. It would be nice to be able to have a directory Flume can write the files into, once it finishes streaming data to a job and closes the file for writing, it can move it to the final directory.
Attachments
Issue Links
- is superceded by
-
FLUME-1702 HDFSEventSink should write to a hidden file as opposed to a .tmp file
- Resolved