Since we still don't have a wiki for Chukwa, I'll put more information here.
Corinne will work on documented this, like all the Chukwa documentation.
All new daemon are responsible for taking data from the previous step and producing data for the next one.
Each one running asynchronously from the others
- Collector-> DataSink (input for DemuxManager)
|-> Demux output (ChukwaRecord, input for PostProcessorManager)
|-> move dataSink file to dataSinkArchive directory
|-> consume demux output, load to database
|-> move ChukwaRecord to /chukwa/repos/...
|-> every 2 hours compact dataSink files
|-> same as before except a fileName change, the filname now contains "HourlyDone" so I can guarantee that the Hourly was done
|-> same as before except that we are now waiting for hourlyRolling to be done before processing a day
>>What does DemuxManager do?
DemuxManager is a daemon process.
It takes care of scheduling Demux on DataSink files, limit the number of input file to demux, force a reprocess of any dataSink files that were part of the previous demux if DemuxManager has been killed and after 3 attempts to process the same list of DataSink files, DemuxManager automatically move those faulty dataSink file to an Error directory
>>What does PostProcessorManager do?
Load all demuxOutput to DB
>>Do I need Nagios?
-No, if you're not adding your nagios information to chukwa-demux-conf.xml, DemuxManager will not send anything to Nagios
>>or I think I have it but it's down
-Nothing, DemuxManager will try to send an NSCA command via a socket connection, this command has no impact on DemuxManager.