Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3931

MR tasks failing due to changing timestamps on Resources to download

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.23.0
    • 0.23.2
    • mrv2
    • None
    • Reviewed
    • Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps.

    Description

      karams reported this offline. Seems that tasks are randomly failing during gridmix runs:

      2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
      java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
             at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
             at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
             at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
             at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
             at java.security.AccessController.doPrivileged(Native Method)
             at javax.security.auth.Subject.doAs(Subject.java:396)
             at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
             at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
             at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
             at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
             at java.util.concurrent.FutureTask.run(FutureTask.java:138)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
             at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
             at java.util.concurrent.FutureTask.run(FutureTask.java:138)
             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
             at java.lang.Thread.run(Thread.java:619)
       at LocalTrace:
             org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
             at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
             at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
             at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
             at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
             at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
             at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
             at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
             at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
             at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
             at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
             at java.security.AccessController.doPrivileged(Native Method)
             at javax.security.auth.Subject.doAs(Subject.java:396)
             at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
      

      Attachments

        1. MR3931.txt
          5 kB
          Siddharth Seth

        Activity

          People

            sseth Siddharth Seth
            vinodkv Vinod Kumar Vavilapalli
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: