-
Type:
Improvement
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 0.23.0
-
Fix Version/s: 0.23.0
-
Component/s: tools/rumen
-
Labels:None
-
Hadoop Flags:Reviewed
-
Release Note:Adds job configuration parameters to the job trace. The configuration parameters are stored under the 'jobProperties' field as key-value pairs.
-
Tags:rumen, job-conf, job-properties
To emulate distributed cache usage in gridmix jobs, there are 9 configuration properties needed to be available in trace file:
(1) mapreduce.job.cache.files
(2) mapreduce.job.cache.files.visibilities
(3) mapreduce.job.cache.files.filesizes
(4) mapreduce.job.cache.files.timestamps
(5) mapreduce.job.cache.archives
(6) mapreduce.job.cache.archives.visibilities
(7) mapreduce.job.cache.archives.filesizes
(8) mapreduce.job.cache.archives.timestamps
(9) mapreduce.job.cache.symlink.create
To emulate data compression in gridmix jobs, trace file should contain the following configuration properties:
(1) mapreduce.map.output.compress
(2) mapreduce.map.output.compress.codec
(3) mapreduce.output.fileoutputformat.compress
(4) mapreduce.output.fileoutputformat.compress.codec
(5) mapreduce.output.fileoutputformat.compress.type
Ideally, gridmix should set many job specific configuration properties like io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same effect of original/real job in terms of spilled records, number of merges, etc.
TraceBuilder should bring in all these properties into the generated trace file.
- blocks
-
MAPREDUCE-2407 Make Gridmix emulate usage of Distributed Cache files
-
- Closed
-
-
MAPREDUCE-2408 Make Gridmix emulate usage of data compression
-
- Closed
-
-
MAPREDUCE-2725 Make Gridmix configure job specific config properties for the simulated jobs
-
- Open
-