[FLUME-1486] Ability to configure a staging directory for data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Sinks+Sources
Labels:
None

Description

It would be nice to be able to configure a staging directory for files being written to HDFS. Once the file stream is complete the file would then be moved to the configured "final" directory.

One example use case where this helps is with log files which are being analyzed by Hive. We could have a Hive table that points to HDFS folder which contains a bunch of log files. As it stands, if flume is writing a tmp file into that directory, and you fire up a MapReduce job, and that file is finished being written to (thus changing the filename) than the job will fail because it can't find that job.

The current workaround is to use virtual columns to not look at TMP files, but this tedious to do for every query. It would be nice to be able to have a directory Flume can write the files into, once it finishes streaming data to a job and closes the file for writing, it can move it to the final directory.

Attachments

Issue Links

is superceded by

FLUME-1702 HDFSEventSink should write to a hidden file as opposed to a .tmp file

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Ricky Saltzer

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Aug/12 17:51

Updated:: 20/Dec/12 03:19

Resolved:: 20/Dec/12 03:19