Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Introduced Chukwa data collection and analysis framework.

      Description

      We'd like to contribute Chukwa, a data collection and analysis framework being developed at Yahoo!. Chukwa is a natural complement to Hadoop, since it is built on top of HDFS and Map-Reduce, and since Hadoop clusters are a key use case.

      1. chukwa-patch-0.0.1.tar.gz
        5.26 MB
        Eric Yang
      2. chukwa_08.pdf
        151 kB
        Ari Rabkin

        Issue Links

          Activity

          Hide
          Jerome Boulon added a comment -

          Hi Alex,
          If you search for "chukwa" against the Jira website you'll see a list of
          patches that we want to commit, but we're depending on external Apache
          committers to get them committed.

          /Jerome.

          Show
          Jerome Boulon added a comment - Hi Alex, If you search for "chukwa" against the Jira website you'll see a list of patches that we want to commit, but we're depending on external Apache committers to get them committed. /Jerome.
          Hide
          Alex Loddengaard added a comment -

          Ari, you and Jerome, in an email thread a week or so back, mentioned that you were planning on releasing a second Chukwa patch. Any updates here?

          Show
          Alex Loddengaard added a comment - Ari, you and Jerome, in an email thread a week or so back, mentioned that you were planning on releasing a second Chukwa patch. Any updates here?
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The jar files in .../chukwa/lib lead to some javadoc warnings. See HADOOP-3949

          Show
          Tsz Wo Nicholas Sze added a comment - The jar files in .../chukwa/lib lead to some javadoc warnings. See HADOOP-3949
          Hide
          Owen O'Malley added a comment -

          I just committed this. Thanks, guys!

          Show
          Owen O'Malley added a comment - I just committed this. Thanks, guys!
          Hide
          Eric Yang added a comment -

          Removed all LGPL dependent libraries, and include readme and license files for all dependent libraries.

          Show
          Eric Yang added a comment - Removed all LGPL dependent libraries, and include readme and license files for all dependent libraries.
          Hide
          Eric Yang added a comment -

          Removed LGPL libraries, and include README and LICENSE file for all dependent libraries.

          Show
          Eric Yang added a comment - Removed LGPL libraries, and include README and LICENSE file for all dependent libraries.
          Hide
          Owen O'Malley added a comment -

          The current patch is still missing license files for a lot of the jar files and it includes LGPL libraries, which can't be included. It would probably help to have a README in the lib directory that lists the jar files and which project they are from and their license.

          Show
          Owen O'Malley added a comment - The current patch is still missing license files for a lot of the jar files and it includes LGPL libraries, which can't be included. It would probably help to have a README in the lib directory that lists the jar files and which project they are from and their license.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12387671/chukwa-patch-0.0.1.tgz
          against trunk revision 683448.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3026/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12387671/chukwa-patch-0.0.1.tgz against trunk revision 683448. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3026/console This message is automatically generated.
          Hide
          Eric Yang added a comment -

          Polish patch to meet requirement.

          Show
          Eric Yang added a comment - Polish patch to meet requirement.
          Hide
          Eric Yang added a comment -

          Polish patch to meet patching requirement.

          Show
          Eric Yang added a comment - Polish patch to meet patching requirement.
          Hide
          Eric Yang added a comment -
          • Removed openflashchart from source and compiled into library file.
          • Added licenses for library files.
          • Added licenses for all java source files.
          • compress tarball relative to HADOOP_HOME
          • Ran RAT audit tool successfully.
          • Removed empty directory.
          Show
          Eric Yang added a comment - Removed openflashchart from source and compiled into library file. Added licenses for library files. Added licenses for all java source files. compress tarball relative to HADOOP_HOME Ran RAT audit tool successfully. Removed empty directory.
          Hide
          Owen O'Malley added a comment -

          Please include license files parallel to the included jars that are not Apache projects.

          Please make the tarball relative to $HADOOP_HOME.

          Make sure you don't have any empty directories (or others that shouldn't be checked in).

          Please run the release audit tool over the submission to make sure that your source files all have copyright notices.
          I notice that inputtools/mdl/DBSummaryLoader.java and JobLogDataLoader.java do not.

          Please remove the source code for org.openflashcart.

          Show
          Owen O'Malley added a comment - Please include license files parallel to the included jars that are not Apache projects. Please make the tarball relative to $HADOOP_HOME. Make sure you don't have any empty directories (or others that shouldn't be checked in). Please run the release audit tool over the submission to make sure that your source files all have copyright notices. I notice that inputtools/mdl/DBSummaryLoader.java and JobLogDataLoader.java do not. Please remove the source code for org.openflashcart.
          Hide
          Ari Rabkin added a comment -

          Initial chukwa release.

          Show
          Ari Rabkin added a comment - Initial chukwa release.
          Hide
          Jerome Boulon added a comment -

          Yes, we're planning to add Chukwa to hadoop Tree as a contrib module within
          the next few days.
          /Jerome

          Show
          Jerome Boulon added a comment - Yes, we're planning to add Chukwa to hadoop Tree as a contrib module within the next few days. /Jerome
          Hide
          Doug Cutting added a comment -

          How will this be integrated with Hadoop? As a contrib module?

          Show
          Doug Cutting added a comment - How will this be integrated with Hadoop? As a contrib module?
          Hide
          Ari Rabkin added a comment -

          Pete –
          Yes, the sink file writers are pluggable. In fact, our current writer uses the Hadoop FileSystem class, so I believe that if you pass a local path that points at NFS, it'll "just work". We haven't tested that, though.

          Show
          Ari Rabkin added a comment - Pete – Yes, the sink file writers are pluggable. In fact, our current writer uses the Hadoop FileSystem class, so I believe that if you pass a local path that points at NFS, it'll "just work". We haven't tested that, though.
          Hide
          Pete Wyckoff added a comment -

          Do the "sink files" need to be in HDFS or is this pluggable as well so I could write to other filesystems, e.g., NFS?

          Show
          Pete Wyckoff added a comment - Do the "sink files" need to be in HDFS or is this pluggable as well so I could write to other filesystems, e.g., NFS?
          Hide
          Mac Yang added a comment -

          Very happy to see interest in this work.

          We are planning on doing the initial check-in in a week or so. Hopefully that will give folks better idea on what we are trying to do and also serves as the starting point for collaboration.

          Show
          Mac Yang added a comment - Very happy to see interest in this work. We are planning on doing the initial check-in in a week or so. Hopefully that will give folks better idea on what we are trying to do and also serves as the starting point for collaboration.
          Hide
          Enis Soztutar added a comment -

          Wow, that's just what I needed.
          When can we expect the patch ?

          Show
          Enis Soztutar added a comment - Wow, that's just what I needed. When can we expect the patch ?
          Hide
          Ari Rabkin added a comment -

          Chukwa is designed to collect monitoring data (especially log files), and get the data into HDFS as quickly as possible. Data is initially collected by a Local Agent running on each machine being monitored. This Local Agent has a pluggable architecture, allowing many different adaptors to be used, each of which produces a particular stream of data. Local Agents send their data via HTTP to Collectors, which write out data into "sink files" in HDFS.

          Map-reduce jobs run periodically to analyze these sink files, and to drain their contents into structured storage.

          Chukwa provides a natural solution to the log collection problem, posed in HADOOP-2206. Once we have Chukwa working at scale, we intend to produce some patches to Hadoop to trigger log collection appropriately.

          We expect this work to ultimately be complementary to HADOOP-3585, the failure analysis system. We want to collect similar data, and our framework is flexible enough to accommodate the proposed structure there, with only modest code changes on each side.

          The attached document introduces Chukwa, and describes the data collection architecture. We do not present our analytics and visualization in detail in this document. We intend to describe them in a second document in the near future.

          Show
          Ari Rabkin added a comment - Chukwa is designed to collect monitoring data (especially log files), and get the data into HDFS as quickly as possible. Data is initially collected by a Local Agent running on each machine being monitored. This Local Agent has a pluggable architecture, allowing many different adaptors to be used, each of which produces a particular stream of data. Local Agents send their data via HTTP to Collectors, which write out data into "sink files" in HDFS. Map-reduce jobs run periodically to analyze these sink files, and to drain their contents into structured storage. Chukwa provides a natural solution to the log collection problem, posed in HADOOP-2206 . Once we have Chukwa working at scale, we intend to produce some patches to Hadoop to trigger log collection appropriately. We expect this work to ultimately be complementary to HADOOP-3585 , the failure analysis system. We want to collect similar data, and our framework is flexible enough to accommodate the proposed structure there, with only modest code changes on each side. The attached document introduces Chukwa, and describes the data collection architecture. We do not present our analytics and visualization in detail in this document. We intend to describe them in a second document in the near future.

            People

            • Assignee:
              Ari Rabkin
              Reporter:
              Ari Rabkin
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development