Chukwa
  1. Chukwa
  2. CHUKWA-674

Integrate Chukwa collector feature to Chukwa agent

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Data Collection
    • Labels:
      None
    • Environment:

      MacOSX, Java 6

    • Release Note:
      Integrate Chukwa collector into Chukwa Agent

      Description

      Feature offered in Chukwa collector can be integrated into Chukwa agent, and use multi-tier Chukwa agent to collect data for large scale cluster. For small cluster, agents can talk directly to HDFS cluster to reduce the complexity of deployment. The required features to reduce the need of Chukwa collectors are:

      • Enhance agent rest api to receive chunk data.
      • Pipeline writer to channel data to storage destinations (HDFS, HBASE).
      • Improve connector interface and replace http connector with collector connector for bandwidth balance.

        Activity

        Hide
        Hudson added a comment -

        FAILURE: Integrated in Chukwa-trunk #491 (See https://builds.apache.org/job/Chukwa-trunk/491/)
        CHUKWA-674. Integrated Chukwa collector feature to Chukwa Agent. (Eric Yang) (eyang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606617)

        • /chukwa/trunk/CHANGES.txt
        • /chukwa/trunk/bin/chukwa
        • /chukwa/trunk/conf/chukwa-agent-conf.xml
        • /chukwa/trunk/conf/hbase.schema
        • /chukwa/trunk/pom.xml
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/analysis/salsa/visualization/Heatmap.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/analysis/salsa/visualization/Swimlanes.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/adaptor/sigar/SystemMetrics.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/agent/ChukwaAgent.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/CollectorStub.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/servlet/CommitCheckServlet.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/servlet/LogDisplayServlet.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/servlet/ServletCollector.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/servlet/ServletDiagnostics.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/connector/PipelineConnector.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/test/FileTailerStressTest.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/writer/ExtractorWriter.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/writer/PipelineStageWriter.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/writer/SocketTeeWriter.java
        • /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/writer/hbase/HBaseWriter.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/TestOffsetStatsManager.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/adaptor/TestSyslogAdaptor.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/adaptor/filetailer/TestLogRotate.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/adaptor/filetailer/TestRCheckAdaptor.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/agent/rest/TestAdaptorController.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/connector/TestFailedCollector.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/writer/TestHBaseWriter.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/writer/TestSocketTee.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/dataloader/TestSocketDataLoader.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/rest/resource/TestClientTrace.java
        • /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/tools/backfilling/TestBackfillingLoader.java
        Show
        Hudson added a comment - FAILURE: Integrated in Chukwa-trunk #491 (See https://builds.apache.org/job/Chukwa-trunk/491/ ) CHUKWA-674 . Integrated Chukwa collector feature to Chukwa Agent. (Eric Yang) (eyang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1606617 ) /chukwa/trunk/CHANGES.txt /chukwa/trunk/bin/chukwa /chukwa/trunk/conf/chukwa-agent-conf.xml /chukwa/trunk/conf/hbase.schema /chukwa/trunk/pom.xml /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/analysis/salsa/visualization/Heatmap.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/analysis/salsa/visualization/Swimlanes.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/adaptor/sigar/SystemMetrics.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/agent/ChukwaAgent.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/CollectorStub.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/servlet/CommitCheckServlet.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/servlet/LogDisplayServlet.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/servlet/ServletCollector.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/servlet/ServletDiagnostics.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/connector/PipelineConnector.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/test/FileTailerStressTest.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/writer/ExtractorWriter.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/writer/PipelineStageWriter.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/writer/SocketTeeWriter.java /chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/writer/hbase/HBaseWriter.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/TestOffsetStatsManager.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/adaptor/TestSyslogAdaptor.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/adaptor/filetailer/TestLogRotate.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/adaptor/filetailer/TestRCheckAdaptor.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/agent/rest/TestAdaptorController.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/connector/TestFailedCollector.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/writer/TestHBaseWriter.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/datacollection/writer/TestSocketTee.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/dataloader/TestSocketDataLoader.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/rest/resource/TestClientTrace.java /chukwa/trunk/src/test/java/org/apache/hadoop/chukwa/tools/backfilling/TestBackfillingLoader.java
        Hide
        Eric Yang added a comment -

        I just committed this.

        Show
        Eric Yang added a comment - I just committed this.
        Hide
        Eric Yang added a comment -

        Shreyas,

        SeqFileWriter was built prior to introduction of PipelinableWriter. This is the reason that it can not write and blocks the chunk to be passed to the next writer. If the configuration is done with SeqFileWriter being last, it will work fine. In the event, if the writer fails for bad data, the chunk can be dropped. In the event that writer failed due to down stream unavailability, then the same chunk can be retried. It is possible to have duplicated data this way, and the sequence id helps to eliminate the duplication. Hence, this should be working as designed.

        Show
        Eric Yang added a comment - Shreyas, SeqFileWriter was built prior to introduction of PipelinableWriter. This is the reason that it can not write and blocks the chunk to be passed to the next writer. If the configuration is done with SeqFileWriter being last, it will work fine. In the event, if the writer fails for bad data, the chunk can be dropped. In the event that writer failed due to down stream unavailability, then the same chunk can be retried. It is possible to have duplicated data this way, and the sequence id helps to eliminate the duplication. Hence, this should be working as designed.
        Hide
        shreyas subramanya added a comment -

        Hi Eric,
        1. SeqFileWriter is not passing data to next stage of the pipeline
        2. Error in one pipeline stage will affect the remaining stages. For example, HBaseWriter throws exception when HBase is down and this will prevent other writers from seeing these chunks. So instead of a pipeline, how about we spawn new threads for each pipeline stage and merge the results? This will also help us in future when we do kafka integration. Each new writer can be a kafka client thread and perform at its own pace.
        3. Since we currently do processing for JMX and other monitoring metrics within the demux map processors, maybe we should move the demux phase to be called inside the PipelineConnector rather than in HBaseWriter? This will ensure the same data will be uniformly available to all writers (for example hbase and alert writers)

        Show
        shreyas subramanya added a comment - Hi Eric, 1. SeqFileWriter is not passing data to next stage of the pipeline 2. Error in one pipeline stage will affect the remaining stages. For example, HBaseWriter throws exception when HBase is down and this will prevent other writers from seeing these chunks. So instead of a pipeline, how about we spawn new threads for each pipeline stage and merge the results? This will also help us in future when we do kafka integration. Each new writer can be a kafka client thread and perform at its own pace. 3. Since we currently do processing for JMX and other monitoring metrics within the demux map processors, maybe we should move the demux phase to be called inside the PipelineConnector rather than in HBaseWriter? This will ensure the same data will be uniformly available to all writers (for example hbase and alert writers)
        Hide
        Eric Yang added a comment -

        Integrate Chukwa collector writer feature into Chukwa Agent using connector feature in Chukwa Agent. The default implementation configures PipelineConnector to invoke HBaseWriter to writing data directly to HBase. A lot of test case bug fixes to make result more accurate.

        Show
        Eric Yang added a comment - Integrate Chukwa collector writer feature into Chukwa Agent using connector feature in Chukwa Agent. The default implementation configures PipelineConnector to invoke HBaseWriter to writing data directly to HBase. A lot of test case bug fixes to make result more accurate.
        Hide
        Eric Yang added a comment - - edited

        Redesign the system to work like this:

        Adaptor -> ChukwaAgent -> Connector (Configurable)

        The proposed implementation support this:

        Adaptor -> ChukwaAgent -> PipelineConnector -> HBaseWriter

        There is a new property called chukwa.pipeline in Agent configuration which defines a list of writer to send data. User can also mirror data to both HDFS and HBase by define both HBaseWriter and SeqFileWriter.

        Show
        Eric Yang added a comment - - edited Redesign the system to work like this: Adaptor -> ChukwaAgent -> Connector (Configurable) The proposed implementation support this: Adaptor -> ChukwaAgent -> PipelineConnector -> HBaseWriter There is a new property called chukwa.pipeline in Agent configuration which defines a list of writer to send data. User can also mirror data to both HDFS and HBase by define both HBaseWriter and SeqFileWriter.
        Hide
        Eric Yang added a comment - - edited

        Hi Shreyas, the connector interface is for configuration setup, and connections setup. HBase connector has different logic than http connector because the mechanism to talk to the destination nodes are different. It is possible that there is no HBaseSender.send() in HBase connector. Chukwa agent should be modified to use ChukwaWriter as default. CollectorWriter would host the logic of httpConnector. EventQueue is only created for httpConnector. ChukwaWriter uses the add(List<Chunk> ) to push data out instead.

        Show
        Eric Yang added a comment - - edited Hi Shreyas, the connector interface is for configuration setup, and connections setup. HBase connector has different logic than http connector because the mechanism to talk to the destination nodes are different. It is possible that there is no HBaseSender.send() in HBase connector. Chukwa agent should be modified to use ChukwaWriter as default. CollectorWriter would host the logic of httpConnector. EventQueue is only created for httpConnector. ChukwaWriter uses the add(List<Chunk> ) to push data out instead.
        Hide
        shreyas subramanya added a comment -

        I wonder if we should do away with the "Connector" interface and just have a single connector class. Each class implementing this interface just duplicates this code - loop on fetching from the EventQueue and call ChukwaSender.send(). Further, each addition of a new Sender (Ex. HBase sender) would require addition of a new connector and duplicating code.

        When we integrate collector into the agent, we would just need to add Senders for sending through http, writing to HBase, HDFS or console and a pipeline sender. The agent configuration would then have "agent.sender" property to choose a sender and an "agent.pipeline" property if a pipeline sender is chosen.

        Show
        shreyas subramanya added a comment - I wonder if we should do away with the "Connector" interface and just have a single connector class. Each class implementing this interface just duplicates this code - loop on fetching from the EventQueue and call ChukwaSender.send(). Further, each addition of a new Sender (Ex. HBase sender) would require addition of a new connector and duplicating code. When we integrate collector into the agent, we would just need to add Senders for sending through http, writing to HBase, HDFS or console and a pipeline sender. The agent configuration would then have "agent.sender" property to choose a sender and an "agent.pipeline" property if a pipeline sender is chosen.

          People

          • Assignee:
            Eric Yang
            Reporter:
            Eric Yang
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 10h
              10h
              Remaining:
              Remaining Estimate - 10h
              10h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development