Details
-
Improvement
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.6.0
-
None
Description
I have noticed repeatedly that the map tasks take a lot of time to startup on YARN clusters. This is not the scheduling part, this is after the actual container is launched containing the Map task. Take for example, the sample log from a mapper of a Pi job that I launched. The command I used to launch the Pi job was:
hadoop jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop*mapreduce*examples*jar pi 10 100
This is the sample job from one of the mappers which took 14 seconds to complete. If you notice from the logs, most of the time taken by this job is during the start up. I notice that the most mappers take anywhere between 7 to 15 seconds during start up and have seen this behavior consistent across mapreduce jobs. This really affects the performance of short running mappers.
I run a hadoop2 / yarn cluster on a 4-5 node m1.xlarge cluster, and the mapper memory is always specified as 2048m and so on.
Log:
2015-04-18 06:48:34,081 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens: 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1429338752209_0059, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@5d48e5d6) 2015-04-18 06:48:35,391 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now. 2015-04-18 06:48:36,656 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /media/ephemeral3/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral1/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral0/yarn/local/usercache/rjain/appcache/application_1429338752209_0059 2015-04-18 06:48:36,706 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:37,387 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:39,388 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 2015-04-18 06:48:39,448 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:41,060 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem: setting Progress to org.apache.hadoop.mapred.Task$TaskReporter@601211d0 comment setting up progress from Task 2015-04-18 06:48:41,098 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2015-04-18 06:48:41,585 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://ec2-54-211-109-245.compute-1.amazonaws.com:9000/user/rjain/QuasiMonteCarlo_1429339685772_504558444/in/part4:0+118 2015-04-18 06:48:43,926 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 234881020(939524080) 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 896 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 657666880 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 939524096 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 234881020; length = 58720256 2015-04-18 06:48:43,946 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 18; bufvoid = 939524096 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 234881020(939524080); kvend = 234881016(939524064); length = 5/58720256 2015-04-18 06:48:44,065 INFO [main] org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory: /media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059/attempt_1429338752209_0059_m_000004_0_spill_0.out 2015-04-18 06:48:44,089 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy] 2015-04-18 06:48:44,100 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0 2015-04-18 06:48:44,111 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1429338752209_0059_m_000004_0 is done. And is in the process of committing 2015-04-18 06:48:44,164 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem: setting Progress to null comment clearing progress now that task is done 2015-04-18 06:48:44,191 INFO [main] org.apache.hadoop.mapred.Task: Counters for attempt_1429338752209_0059_m_000004_0 2015-04-18 06:48:44,203 INFO [main] org.apache.hadoop.mapred.Task: File System Counters 2015-04-18 06:48:44,206 INFO [main] org.apache.hadoop.mapred.Task: FILE: Number of bytes read = 0 2015-04-18 06:48:44,206 INFO [main] org.apache.hadoop.mapred.Task: FILE: Number of bytes written = 116166 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: FILE: Number of read operations = 0 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: FILE: Number of large read operations = 0 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: FILE: Number of write operations = 0 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: HDFS: Number of bytes read = 297 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: HDFS: Number of bytes written = 0 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: HDFS: Number of read operations = 4 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: HDFS: Number of large read operations = 0 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: HDFS: Number of write operations = 0 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: Map-Reduce Framework 2015-04-18 06:48:44,208 INFO [main] org.apache.hadoop.mapred.Task: Map input records = 1 2015-04-18 06:48:44,208 INFO [main] org.apache.hadoop.mapred.Task: Map output records = 2 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Map output bytes = 18 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Map output materialized bytes = 34 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Input split bytes = 179 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Combine input records = 0 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Spilled Records = 2 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Failed Shuffles = 0 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Merged Map outputs = 0 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: GC time elapsed (ms) = 158 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: CPU time spent (ms) = 2080 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: Physical memory (bytes) snapshot = 1104715776 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: Virtual memory (bytes) snapshot = 2900013056 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: Total committed heap usage (bytes) = 1251475456 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: File Input Format Counters 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: Bytes Read = 118 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: File System Counters 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: FILE: Number of bytes read = 0 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: FILE: Number of bytes written = 116166 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: FILE: Number of read operations = 0 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: FILE: Number of large read operations = 0 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: FILE: Number of write operations = 0 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: HDFS: Number of bytes read = 297 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: HDFS: Number of bytes written = 0 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: HDFS: Number of read operations = 4 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: HDFS: Number of large read operations = 0 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: HDFS: Number of write operations = 0 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: Map-Reduce Framework 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: Map input records = 1 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Map output records = 2 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Map output bytes = 18 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Map output materialized bytes = 34 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Input split bytes = 179 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Combine input records = 0 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Spilled Records = 2 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Failed Shuffles = 0 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Merged Map outputs = 0 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: GC time elapsed (ms) = 158 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: CPU time spent (ms) = 2080 2015-04-18 06:48:44,213 INFO [main] org.apache.hadoop.mapred.Task: Physical memory (bytes) snapshot = 1104715776 2015-04-18 06:48:44,213 INFO [main] org.apache.hadoop.mapred.Task: Virtual memory (bytes) snapshot = 2900013056 2015-04-18 06:48:44,213 INFO [main] org.apache.hadoop.mapred.Task: Total committed heap usage (bytes) = 1251475456 2015-04-18 06:48:44,214 INFO [main] org.apache.hadoop.mapred.Task: File path: /media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059/output/attempt_1429338752209_0059_m_000004_0/file.out 2015-04-18 06:48:44,223 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1429338752209_0059_m_000004_0' done. 2015-04-18 06:48:44,225 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system... 2015-04-18 06:48:44,231 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped. 2015-04-18 06:48:44,233 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.