Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3931

MR tasks failing due to changing timestamps on Resources to download

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.2
    • Component/s: mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps.

      Description

      Karam Singh reported this offline. Seems that tasks are randomly failing during gridmix runs:

      2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
      java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
             at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
             at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
             at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
             at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
             at java.security.AccessController.doPrivileged(Native Method)
             at javax.security.auth.Subject.doAs(Subject.java:396)
             at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
             at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
             at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
             at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
             at java.util.concurrent.FutureTask.run(FutureTask.java:138)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
             at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
             at java.util.concurrent.FutureTask.run(FutureTask.java:138)
             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
             at java.lang.Thread.run(Thread.java:619)
       at LocalTrace:
             org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
             at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
             at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
             at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
             at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
             at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
             at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
             at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
             at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
             at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
             at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
             at java.security.AccessController.doPrivileged(Native Method)
             at javax.security.auth.Subject.doAs(Subject.java:396)
             at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
      
      1. MR3931.txt
        5 kB
        Siddharth Seth

        Activity

          People

          • Assignee:
            Siddharth Seth
            Reporter:
            Vinod Kumar Vavilapalli
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development