Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.23.0
-
None
-
Reviewed
-
Adds job configuration parameters to the job trace. The configuration parameters are stored under the 'jobProperties' field as key-value pairs.
-
rumen, job-conf, job-properties
Description
To emulate distributed cache usage in gridmix jobs, there are 9 configuration properties needed to be available in trace file:
(1) mapreduce.job.cache.files
(2) mapreduce.job.cache.files.visibilities
(3) mapreduce.job.cache.files.filesizes
(4) mapreduce.job.cache.files.timestamps
(5) mapreduce.job.cache.archives
(6) mapreduce.job.cache.archives.visibilities
(7) mapreduce.job.cache.archives.filesizes
(8) mapreduce.job.cache.archives.timestamps
(9) mapreduce.job.cache.symlink.create
To emulate data compression in gridmix jobs, trace file should contain the following configuration properties:
(1) mapreduce.map.output.compress
(2) mapreduce.map.output.compress.codec
(3) mapreduce.output.fileoutputformat.compress
(4) mapreduce.output.fileoutputformat.compress.codec
(5) mapreduce.output.fileoutputformat.compress.type
Ideally, gridmix should set many job specific configuration properties like io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same effect of original/real job in terms of spilled records, number of merges, etc.
TraceBuilder should bring in all these properties into the generated trace file.
Attachments
Attachments
Issue Links
- blocks
-
MAPREDUCE-2407 Make Gridmix emulate usage of Distributed Cache files
- Closed
-
MAPREDUCE-2408 Make Gridmix emulate usage of data compression
- Closed
-
MAPREDUCE-2725 Make Gridmix configure job specific config properties for the simulated jobs
- Open