Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
This is a proposed feature addition to extract state-machine views from Hadoop's logs (TaskTracker, JobTracker, and DataNode currently supported, NameNode soon). These views are as described in http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ and will enable analysis and diagnosis algorithms to be built on top of them.
Building a full SALSA view involves two steps:
1. Incrementally parsing log entries on a per-node basis to extract states (line-by-line reading, assuming the entire log file from a given node is available to the same process)
2. "Stitching" and correlating states across all logs (across nodes and across types) to build a full state machine.
My idea is to add SALSA as two jobs in the demux stage, with the first parsing job in demux, and either having:
(a) the parsing job write its output to the permanent store with the correlating job reading/writing from/to the permanent store, or
(b) the parsing job write its output back to the sinkfile and having the correlating job reading from the sink file and writing to the permanent store.
Attachments
Attachments
Issue Links
- incorporates
-
CHUKWA-360 Causal stitching of SALSA states
- Open
-
CHUKWA-344 State-Machine Generation for input to SALSA visualizations
- Resolved
-
CHUKWA-279 Swimlanes visualization for Hadoop job progress
- Resolved
-
CHUKWA-299 HDFS Spatial Heatmaps
- Resolved
-
CHUKWA-342 Static Swimlanes Visualization Widget
- Resolved
-
CHUKWA-343 Static HDFS Heatmap Visualization
- Resolved