The current demux-archive plumbing is quite complicated. At Berkeley, we need something much simpler.
duplicate suppression in archiver
Taking silence for consent, I just committed this.
Revised, fixes a few unit test problems.
No. The archiver, by default in this patch, will group by cluster, day and datatype. Which is well suited to our use case, which is mapreduce analytics of logs.
If there's no Demux, then the purpose of Chukwa will be just to collect logs, and store them in a single jumbled mix of all the log record types?
A future enhancement, once we have appends, is to actually merge files during promotion, and not just rename to avoid collision.
Simple sink archiver.
Copies all the .done files out of the sink, runs an archiver MapReduce job, then merges output of that job into archive, renaming files to avoid collision.
Intended use is to run once every day or two, to empty out sink.