Details
-
Improvement
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
All hadoop logs are using the same RecordType, so only 1 Reducer is used to process all log files (other than DN,NN,Audit).
This cause a SKU issue at the M/R level.
So all hadoop logs should use a different RecordType.
Note:
- using the cluster information in the ChukwaRecordPartitioner will also help.
- using a predefine list of recordType/reducer association will also help by avoiding to have 2 log RecordType going to the same reducer,
the dynamic affectation ( ( hashCode() & Integer.MAX_VALUE) % numReduceTasks) could be used at a fallback mechanism