Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-922

Host Affinity - Bug in SamzaContainerRequest causes (recoverable) exceptions in YARN

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.1
    • None
    • None

    Description

      The constructor for SamzaContainerRequest creates the Yarn container request differently depending on whether there is a preferred host or not. Unfortunately, it looks for preferredHost == null but not preferredHost.equals(ANY_HOST) and ANY_HOST is the string passed when there is no preferred host.

      As a result, the Yarn container request is actually asking for a container on the host name "ANY_HOST" which causes the following exception:

      2016-03-29 21:25:53.892 [main] ScriptBasedMapping [WARN] Exception running /OMITTED/sbin/yarn-topology.py ANY_HOST
      java.io.IOException: Cannot run program "/OMITTED/application_1452292535523_0047/container_1452292535523_0047_02_000001"): error=2, No such file or directory
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:1042)
      at org.apache.hadoop.util.Shell.runCommand(Shell.java:485)
      at org.apache.hadoop.util.Shell.run(Shell.java:455)
      at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
      at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
      at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
      at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
      at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
      at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:95)
      at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.resolveRacks(AMRMClientImpl.java:551)
      at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:411)
      at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166)
      at org.apache.samza.job.yarn.ContainerRequestState.updateRequestState(ContainerRequestState.java:82)
      at org.apache.samza.job.yarn.AbstractContainerAllocator.requestContainer(AbstractContainerAllocator.java:102)
      at org.apache.samza.job.yarn.AbstractContainerAllocator.requestContainers(AbstractContainerAllocator.java:85)
      at org.apache.samza.job.yarn.SamzaTaskManager.onInit(SamzaTaskManager.java:112)
      at org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:117)
      at org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:117)
      at scala.collection.immutable.List.foreach(List.scala:318)
      at org.apache.samza.job.yarn.SamzaAppMaster$.run(SamzaAppMaster.scala:117)
      at org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:104)
      at org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)
      Caused by: java.io.IOException: error=2, No such file or directory
      at java.lang.UNIXProcess.forkAndExec(Native Method)
      at java.lang.UNIXProcess.<init>(UNIXProcess.java:187)
      at java.lang.ProcessImpl.start(ProcessImpl.java:134)
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:1023)

      The exception is recoverable when relaxed locality = true because Yarn just defaults to a random host on the default rack, which was the desired result of the ANY_HOST request. However the behavior is incorrect and the stack traces tend to fill the log.

      The string "ANY_HOST" is internal to Samza and Yarn should never see it.

      Attachments

        1. SAMZA-922.patch
          3 kB
          Jake Maes

        Issue Links

          Activity

            People

              jmakes Jake Maes
              jmakes Jake Maes
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: