Issue Details (XML | Word | Printable)

Key: HADOOP-3719
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Ari Rabkin
Reporter: Ari Rabkin
Votes: 0
Watchers: 18
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Chukwa

Created: 08/Jul/08 11:35 PM   Updated: 08/Jul/09 04:40 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.19.0

Time Tracking:
Not Specified

File Attachments:
  Size
GZip Archive Licensed for inclusion in ASF works chukwa-patch-0.0.1.tar.gz 2008-08-12 02:13 AM Eric Yang 5.26 MB
PDF File chukwa_08.pdf 2008-07-08 11:38 PM Ari Rabkin 151 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Release Note: Introduced Chukwa data collection and analysis framework.
Resolution Date: 12/Aug/08 10:39 PM


 Description  « Hide
We'd like to contribute Chukwa, a data collection and analysis framework being developed at Yahoo!. Chukwa is a natural complement to Hadoop, since it is built on top of HDFS and Map-Reduce, and since Hadoop clusters are a key use case.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Ari Rabkin added a comment - 08/Jul/08 11:38 PM
Chukwa is designed to collect monitoring data (especially log files), and get the data into HDFS as quickly as possible. Data is initially collected by a Local Agent running on each machine being monitored. This Local Agent has a pluggable architecture, allowing many different adaptors to be used, each of which produces a particular stream of data. Local Agents send their data via HTTP to Collectors, which write out data into "sink files" in HDFS.

Map-reduce jobs run periodically to analyze these sink files, and to drain their contents into structured storage.

Chukwa provides a natural solution to the log collection problem, posed in HADOOP-2206. Once we have Chukwa working at scale, we intend to produce some patches to Hadoop to trigger log collection appropriately.

We expect this work to ultimately be complementary to HADOOP-3585, the failure analysis system. We want to collect similar data, and our framework is flexible enough to accommodate the proposed structure there, with only modest code changes on each side.

The attached document introduces Chukwa, and describes the data collection architecture. We do not present our analytics and visualization in detail in this document. We intend to describe them in a second document in the near future.


Enis Soztutar added a comment - 10/Jul/08 12:34 PM
Wow, that's just what I needed.
When can we expect the patch ?

Mac Yang added a comment - 11/Jul/08 02:19 AM
Very happy to see interest in this work.

We are planning on doing the initial check-in in a week or so. Hopefully that will give folks better idea on what we are trying to do and also serves as the starting point for collaboration.


Pete Wyckoff added a comment - 14/Jul/08 06:06 PM
Do the "sink files" need to be in HDFS or is this pluggable as well so I could write to other filesystems, e.g., NFS?

Ari Rabkin added a comment - 14/Jul/08 06:21 PM
Pete –
Yes, the sink file writers are pluggable. In fact, our current writer uses the Hadoop FileSystem class, so I believe that if you pass a local path that points at NFS, it'll "just work". We haven't tested that, though.

Doug Cutting added a comment - 15/Jul/08 04:46 PM
How will this be integrated with Hadoop? As a contrib module?

Jerome Boulon added a comment - 15/Jul/08 04:58 PM
Yes, we're planning to add Chukwa to hadoop Tree as a contrib module within
the next few days.
/Jerome

Ari Rabkin added a comment - 17/Jul/08 08:02 PM
Initial chukwa release.

Owen O'Malley added a comment - 18/Jul/08 10:35 PM
Please include license files parallel to the included jars that are not Apache projects.

Please make the tarball relative to $HADOOP_HOME.

Make sure you don't have any empty directories (or others that shouldn't be checked in).

Please run the release audit tool over the submission to make sure that your source files all have copyright notices.
I notice that inputtools/mdl/DBSummaryLoader.java and JobLogDataLoader.java do not.

Please remove the source code for org.openflashcart.


Eric Yang added a comment - 19/Jul/08 12:09 AM
  • Removed openflashchart from source and compiled into library file.
  • Added licenses for library files.
  • Added licenses for all java source files.
  • compress tarball relative to HADOOP_HOME
  • Ran RAT audit tool successfully.
  • Removed empty directory.

Eric Yang added a comment - 06/Aug/08 07:17 PM
Polish patch to meet patching requirement.

Eric Yang added a comment - 06/Aug/08 07:20 PM
Polish patch to meet requirement.

Hadoop QA added a comment - 07/Aug/08 12:01 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12387671/chukwa-patch-0.0.1.tgz
against trunk revision 683448.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

-1 patch. The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3026/console

This message is automatically generated.


Owen O'Malley added a comment - 11/Aug/08 09:58 PM
The current patch is still missing license files for a lot of the jar files and it includes LGPL libraries, which can't be included. It would probably help to have a README in the lib directory that lists the jar files and which project they are from and their license.

Eric Yang added a comment - 12/Aug/08 02:10 AM
Removed LGPL libraries, and include README and LICENSE file for all dependent libraries.

Eric Yang added a comment - 12/Aug/08 02:13 AM
Removed all LGPL dependent libraries, and include readme and license files for all dependent libraries.

Owen O'Malley added a comment - 12/Aug/08 10:39 PM
I just committed this. Thanks, guys!

Tsz Wo (Nicholas), SZE added a comment - 13/Aug/08 09:29 PM
The jar files in .../chukwa/lib lead to some javadoc warnings. See HADOOP-3949

Hudson added a comment - 22/Aug/08 12:34 PM

Alex Loddengaard added a comment - 27/Oct/08 07:52 PM
Ari, you and Jerome, in an email thread a week or so back, mentioned that you were planning on releasing a second Chukwa patch. Any updates here?

Jerome Boulon added a comment - 27/Oct/08 08:12 PM
Hi Alex,
If you search for "chukwa" against the Jira website you'll see a list of
patches that we want to commit, but we're depending on external Apache
committers to get them committed.

/Jerome.