Uploaded image for project: 'Chukwa'
  1. Chukwa
  2. CHUKWA-26

Rewrite processSinkFiles.sh in java to have a better error handling

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Data Processors
    • Labels:
      None
    1. CHUKWA-26.patch
      95 kB
      Jerome Boulon
    2. CHUKWA-26-2.patch
      95 kB
      Jerome Boulon
    3. CHUKWA-26-3.patch
      24 kB
      Eric Yang
    4. CHUKWA-26-4.patch
      96 kB
      Eric Yang
    5. NagiosAppender-1.5.0.jar
      19 kB
      Jerome Boulon

      Issue Links

        Activity

        Hide
        jboulon Jerome Boulon added a comment -

        Raise priority for 0.1.2 release.

        Show
        jboulon Jerome Boulon added a comment - Raise priority for 0.1.2 release.
        Hide
        jboulon Jerome Boulon added a comment -

        Rewrite Demux pipeline

        • DemuxManager, ArchiveManager and PostProcessorManager are now a single daemon process each.
          Each one working independently from others, as soon as something is available.
        • Start-data-processor is now using those new daemons instead of pocessSink.sh
        • Daily will process a daily compaction only when all hourly would have been done.
        • Demux is now able to send NSCA commands to Nagios
        Show
        jboulon Jerome Boulon added a comment - Rewrite Demux pipeline DemuxManager, ArchiveManager and PostProcessorManager are now a single daemon process each. Each one working independently from others, as soon as something is available. Start-data-processor is now using those new daemons instead of pocessSink.sh Daily will process a daily compaction only when all hourly would have been done. Demux is now able to send NSCA commands to Nagios
        Hide
        jboulon Jerome Boulon added a comment -

        Jar responsible for sending NSCA commands, should be in the lib directory.
        http://nagiosappender.sourceforge.net/

        Show
        jboulon Jerome Boulon added a comment - Jar responsible for sending NSCA commands, should be in the lib directory. http://nagiosappender.sourceforge.net/
        Hide
        jboulon Jerome Boulon added a comment -

        New Demux pipeline.
        Running in test for 5 days.
        The old one, processSink.sh is still working, all previous scripts can still be used.

        Show
        jboulon Jerome Boulon added a comment - New Demux pipeline. Running in test for 5 days. The old one, processSink.sh is still working, all previous scripts can still be used.
        Hide
        eyang Eric Yang added a comment -

        -1, the recordTypes should be defined in a flat file, which can be added on the run time. There is no reason that to hard code them in postProcess.sh.

        Show
        eyang Eric Yang added a comment - -1, the recordTypes should be defined in a flat file, which can be added on the run time. There is no reason that to hard code them in postProcess.sh.
        Hide
        asrabkin Ari Rabkin added a comment -

        Nagios appender is under apache license, so we should be able to include it directly.

        Show
        asrabkin Ari Rabkin added a comment - Nagios appender is under apache license, so we should be able to include it directly.
        Hide
        jboulon Jerome Boulon added a comment -

        To make everyone happy, PostProcessManager is reading datasources from configuration instead of command line.

        Show
        jboulon Jerome Boulon added a comment - To make everyone happy, PostProcessManager is reading datasources from configuration instead of command line.
        Hide
        asrabkin Ari Rabkin added a comment -

        I'd appreciate a little bit of javadoc per file and/or some external user documentation.

        What does DemuxManager do? How about PostProcessorManager and ArchiveManager?
        Do I need Nagios? What happens if I don't have it, or I think I have it but it's down?

        Show
        asrabkin Ari Rabkin added a comment - I'd appreciate a little bit of javadoc per file and/or some external user documentation. What does DemuxManager do? How about PostProcessorManager and ArchiveManager? Do I need Nagios? What happens if I don't have it, or I think I have it but it's down?
        Hide
        jboulon Jerome Boulon added a comment -

        Since we still don't have a wiki for Chukwa, I'll put more information here.
        Corinne will work on documented this, like all the Chukwa documentation.

        All new daemon are responsible for taking data from the previous step and producing data for the next one.
        Each one running asynchronously from the others

        • Collector-> DataSink (input for DemuxManager)
        • DemuxManager
          -> Demux output (ChukwaRecord, input for PostProcessorManager)
          -> move dataSink file to dataSinkArchive directory
        • PostProcessorManager
          -> consume demux output, load to database
          -> move ChukwaRecord to /chukwa/repos/...
        • ArchiveManager
          -> every 2 hours compact dataSink files
        • HourlyRolling
          -> same as before except a fileName change, the filname now contains "HourlyDone" so I can guarantee that the Hourly was done
        • DailyRolling
          -> same as before except that we are now waiting for hourlyRolling to be done before processing a day

        >>What does DemuxManager do?
        DemuxManager is a daemon process.
        It takes care of scheduling Demux on DataSink files, limit the number of input file to demux, force a reprocess of any dataSink files that were part of the previous demux if DemuxManager has been killed and after 3 attempts to process the same list of DataSink files, DemuxManager automatically move those faulty dataSink file to an Error directory

        >>What does PostProcessorManager do?
        Load all demuxOutput to DB

        >>Do I need Nagios?
        -No, if you're not adding your nagios information to chukwa-demux-conf.xml, DemuxManager will not send anything to Nagios

        >>or I think I have it but it's down
        -Nothing, DemuxManager will try to send an NSCA command via a socket connection, this command has no impact on DemuxManager.

        Show
        jboulon Jerome Boulon added a comment - Since we still don't have a wiki for Chukwa, I'll put more information here. Corinne will work on documented this, like all the Chukwa documentation. All new daemon are responsible for taking data from the previous step and producing data for the next one. Each one running asynchronously from the others Collector-> DataSink (input for DemuxManager) DemuxManager -> Demux output (ChukwaRecord, input for PostProcessorManager) -> move dataSink file to dataSinkArchive directory PostProcessorManager -> consume demux output, load to database -> move ChukwaRecord to /chukwa/repos/... ArchiveManager -> every 2 hours compact dataSink files HourlyRolling -> same as before except a fileName change, the filname now contains "HourlyDone" so I can guarantee that the Hourly was done DailyRolling -> same as before except that we are now waiting for hourlyRolling to be done before processing a day >>What does DemuxManager do? DemuxManager is a daemon process. It takes care of scheduling Demux on DataSink files, limit the number of input file to demux, force a reprocess of any dataSink files that were part of the previous demux if DemuxManager has been killed and after 3 attempts to process the same list of DataSink files, DemuxManager automatically move those faulty dataSink file to an Error directory >>What does PostProcessorManager do? Load all demuxOutput to DB >>Do I need Nagios? -No, if you're not adding your nagios information to chukwa-demux-conf.xml, DemuxManager will not send anything to Nagios >>or I think I have it but it's down -Nothing, DemuxManager will try to send an NSCA command via a socket connection, this command has no impact on DemuxManager.
        Hide
        jboulon Jerome Boulon added a comment -

        Nagios properties

        <property>
        <name>demux.nagiosHost</name>
        <value>myNagios.com</value>
        <description></description>
        </property>

        <property>
        <name>demux.nagiosPort</name>
        <value>5667</value>
        <description></description>
        </property>

        <property>
        <name>demux.reportingHost4Nagios</name>
        <value>myNagiosReportingHost</value>
        <description></description>
        </property>

        Show
        jboulon Jerome Boulon added a comment - Nagios properties <property> <name>demux.nagiosHost</name> <value>myNagios.com</value> <description></description> </property> <property> <name>demux.nagiosPort</name> <value>5667</value> <description></description> </property> <property> <name>demux.reportingHost4Nagios</name> <value>myNagiosReportingHost</value> <description></description> </property>
        Hide
        asrabkin Ari Rabkin added a comment -

        Awesome. Can you add some of that as class-level javadoc?

        Show
        asrabkin Ari Rabkin added a comment - Awesome. Can you add some of that as class-level javadoc?
        Hide
        zhangyongjiang Cheng added a comment -

        -1 I patched couple record types recently: Ps JobData TaskData HDFSUsage. They are in the trunk already. Please include them in your changes.

        Show
        zhangyongjiang Cheng added a comment - -1 I patched couple record types recently: Ps JobData TaskData HDFSUsage. They are in the trunk already. Please include them in your changes.
        Hide
        jboulon Jerome Boulon added a comment -

        Cheng,
        I'm aware of JobData,TaskData and HDFSUsage, the first 20 lines of my patch are exactly for this.

        Show
        jboulon Jerome Boulon added a comment - Cheng, I'm aware of JobData,TaskData and HDFSUsage, the first 20 lines of my patch are exactly for this.
        Hide
        jboulon Jerome Boulon added a comment -

        I would like to resubmit a pacth with some javadoc (Ari comments) but first I would like to make sure that there's nothing else that should be changed.
        Thanks,
        Jerome.

        Show
        jboulon Jerome Boulon added a comment - I would like to resubmit a pacth with some javadoc (Ari comments) but first I would like to make sure that there's nothing else that should be changed. Thanks, Jerome.
        Hide
        eyang Eric Yang added a comment -

        Is moving data around the directories really necessary? The post process pipeline ask namenode to do a lot of directory shuffling when it is not unnecessary to incur this overhead in namenode.
        This part of the process should be revisited.

        Show
        eyang Eric Yang added a comment - Is moving data around the directories really necessary? The post process pipeline ask namenode to do a lot of directory shuffling when it is not unnecessary to incur this overhead in namenode. This part of the process should be revisited.
        Hide
        jboulon Jerome Boulon added a comment -

        Moving data to a dedicated working folder is to ensure that all processes are working on a persistent and synchronized view of the data.
        This simplify the workflow.
        Since the number of files is limited and controlled, NameNode is able to support that without any problem and the actual data is not moved since I'm using a rename.

        Show
        jboulon Jerome Boulon added a comment - Moving data to a dedicated working folder is to ensure that all processes are working on a persistent and synchronized view of the data. This simplify the workflow. Since the number of files is limited and controlled, NameNode is able to support that without any problem and the actual data is not moved since I'm using a rename.
        Hide
        eyang Eric Yang added a comment -

        finaArchive seems to create a lot of files. Could we fine tune this to reduce number of files?

        Demux probably should call System.gc after each run to keep resource usage as lean as possible since it's a daemon process.

        Show
        eyang Eric Yang added a comment - finaArchive seems to create a lot of files. Could we fine tune this to reduce number of files? Demux probably should call System.gc after each run to keep resource usage as lean as possible since it's a daemon process.
        Hide
        jboulon Jerome Boulon added a comment -

        >> finaArchive seems to create a lot of files. Could we fine tune this to reduce number of files?
        There's a parameter to control the number of output files: chukwaArchiveBuilder.reduceCount

        <property>
        <name>chukwaArchiveBuilder.reduceCount</name>
        <value>5</value>
        <description>Reduce count </description>
        </property>

        >> Demux probably should call System.gc after each run to keep resource usage as lean as possible since it's a daemon process.
        DemuxManager is scheduling the demux Job but does not actually compute it since it's a M/R Job, so memory for DemuxManager should
        be low and stable.

        Show
        jboulon Jerome Boulon added a comment - >> finaArchive seems to create a lot of files. Could we fine tune this to reduce number of files? There's a parameter to control the number of output files: chukwaArchiveBuilder.reduceCount <property> <name>chukwaArchiveBuilder.reduceCount</name> <value>5</value> <description>Reduce count </description> </property> >> Demux probably should call System.gc after each run to keep resource usage as lean as possible since it's a daemon process. DemuxManager is scheduling the demux Job but does not actually compute it since it's a M/R Job, so memory for DemuxManager should be low and stable.
        Hide
        eyang Eric Yang added a comment -

        Fix Demux.java class. The previous patch doesn't apply properly.

        Show
        eyang Eric Yang added a comment - Fix Demux.java class. The previous patch doesn't apply properly.
        Hide
        eyang Eric Yang added a comment -

        Redo patch to include new files.

        Show
        eyang Eric Yang added a comment - Redo patch to include new files.
        Hide
        eyang Eric Yang added a comment -

        I just committed this, thanks Jerome.

        Show
        eyang Eric Yang added a comment - I just committed this, thanks Jerome.

          People

          • Assignee:
            jboulon Jerome Boulon
            Reporter:
            jboulon Jerome Boulon
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development