Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Requirements:
a) Create a tool support archival of logfiles (from diverse sources) in hadoop's dfs.
b) The tool should also support analysis of the logfiles via grep/sort primitives. The tool should allow for fairly generic pattern 'grep's and let users 'sort' the matching lines (from grep) on 'columns' of their choice.
E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sort them based on timestamps (column x) and then on column y (column x, followed by column y).
Design/Implementation:
a) Log Archival
Archival of logs from diverse sources can be accomplished using the distcp tool (HADOOP-341).
b) Log analysis
The idea is to enable users of the tool to perform analysis of logs via grep/sort primitives.
This can be accomplished via a relatively simple Map-Reduce task where the map does the grep for the given pattern via RegexMapper and then the implicit sort (reducer) is used with a custom Comparator which performs the user-specified comparision (columns).
The sort/grep specs can be fairly powerful by letting the user of the tool use java's in-built regex patterns (java.util.regex).
Attachments
Attachments
Issue Links
- depends upon
-
HADOOP-341 Enhance distcp to handle *http* as a 'source protocol'.
- Closed