Chukwa is designed to collect monitoring data (especially log files), and get the data into HDFS as quickly as possible. Data is initially collected by a Local Agent running on each machine being monitored. This Local Agent has a pluggable architecture, allowing many different adaptors to be used, each of which produces a particular stream of data. Local Agents send their data via HTTP to Collectors, which write out data into "sink files" in HDFS.
Map-reduce jobs run periodically to analyze these sink files, and to drain their contents into structured storage.
Chukwa provides a natural solution to the log collection problem, posed in
HADOOP-2206. Once we have Chukwa working at scale, we intend to produce some patches to Hadoop to trigger log collection appropriately.
We expect this work to ultimately be complementary to
HADOOP-3585, the failure analysis system. We want to collect similar data, and our framework is flexible enough to accommodate the proposed structure there, with only modest code changes on each side.
The attached document introduces Chukwa, and describes the data collection architecture. We do not present our analytics and visualization in detail in this document. We intend to describe them in a second document in the near future.