Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3931

MR tasks failing due to changing timestamps on Resources to download

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.2
    • Component/s: mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps.

      Description

      Karam Singh reported this offline. Seems that tasks are randomly failing during gridmix runs:

      2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
      java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
             at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
             at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
             at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
             at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
             at java.security.AccessController.doPrivileged(Native Method)
             at javax.security.auth.Subject.doAs(Subject.java:396)
             at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
             at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
             at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
             at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
             at java.util.concurrent.FutureTask.run(FutureTask.java:138)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
             at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
             at java.util.concurrent.FutureTask.run(FutureTask.java:138)
             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
             at java.lang.Thread.run(Thread.java:619)
       at LocalTrace:
             org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
             at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
             at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
             at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
             at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
             at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
             at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
             at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
             at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
             at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
             at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
             at java.security.AccessController.doPrivileged(Native Method)
             at javax.security.auth.Subject.doAs(Subject.java:396)
             at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
      

        Attachments

        1. MR3931.txt
          5 kB
          Siddharth Seth

          Activity

            People

            • Assignee:
              sseth Siddharth Seth
              Reporter:
              vinodkv Vinod Kumar Vavilapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: