Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-84

Gobblin Yarn working directory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Hi, I'm trying to run a gobblin-yarn job and found some weird behaviors.
      First of all, although I set the property gobblin.yarn.work.dir, Gobblin does not use this property to store anything. Instead, the GobblinYarnAppLauncher creates and stores files in _libjars, appmaster, container directories under `'/user/<userid>/<jobName>/<jobId>'` directory, whereas GobblinHelixJobLauncher uses `'/user/yarn/<jobName>/<jobId>'` as working directory. As a result, GobblinHelixJobLauncher cannot find dependency files.

      I think this is because `GobblinClusterUtils.getAppWorkDirPath` returns `/user/$USER/<appName>/<appId>` as working directory. Maybe it should use `UserGroupInformation` when accessing the resources.

      Github Url : https://github.com/linkedin/gobblin/issues/1258
      Github Reporter : chosh0615
      Github Created At : 2016-09-08T13:45:32Z
      Github Updated At : 2017-01-12T05:09:24Z

      Comments


      abti wrote on 2016-09-08T22:52:21Z : I just worked it backwards in the code, and it doesn't seem that this
      property is being honored. I think it got lost in refactoring efforts.

      However, the paths being used by GobblinYarnAppLauncher and
      GobblinHelixJobLauncher (via GobblinApplicationMaster) are unrelated, so
      should not cause any failure. Can you please share your configs and log
      files?

      Thanks
      Abhishek

      On Thu, Sep 8, 2016 at 7:15 PM, Sean notifications@github.com wrote:

      > Hi, I'm trying to run a job with gobblin-yarn.
      > I found some weird behaviors.
      > First of all, although I set the property gobblin.yarn.work.dir, Gobblin
      > does not use this property to store anything. Instead, the
      > GobblinYarnAppLauncher creates and store files in _libjars, appmaster,
      > container directories under '/user/<userid>/<jobName>/<jobId>' directory,
      > whereas GobblinHelixJobLauncher uses '/user/yarn/<jobName>/<jobId>' as
      > working directory. As a result, GobblinHelixJobLauncher cannot find
      > dependency files.
      > I think this is because YarnHelixUtils.getAppWorkDirPath returns
      > /user/$USER/<appName>/<appId> as working directory.
      >
      > Is there anything to set up to run gobblin-yarn by specific user other
      > then yarn?
      >
      > —
      > You are receiving this because you are subscribed to this thread.
      > Reply to this email directly, view it on GitHub
      > https://github.com/linkedin/gobblin/issues/1258, or mute the thread
      > https://github.com/notifications/unsubscribe-auth/AAEPe4ZXh97Rgww8P47po2K_6ehq5s2sks5qoBF8gaJpZM4J4Ah-
      > .

      Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-245766591


      chosh0615 wrote on 2016-09-12T02:33:37Z : In my observation, GobblinYarnAppLauncher.addAppMasterLocalResources method adds jar files to /user/myuser, but YarnService.newContainerLaunchContext add resources from /user/yarn directory.
      The followings are my configurations.

      application.conf

      ```

      1. Yarn/Helix configuration properties
        gobblin.yarn.helix.cluster.name=GobblinYarn
        gobblin.yarn.app.queue=default
        gobblin.yarn.app.name=GobblinYarn
        gobblin.yarn.zk.connection.string=myhost01:2181,myhost02:2181,myhost03:2181
        gobblin.yarn.app.master.memory.mbs=512
        gobblin.yarn.initial.containers=1
        gobblin.yarn.app.master.files.local=/home/myuser/gobblin/conf/yarn/log4j-yarn.properties,/home/myuser/gobblin/conf/yarn/application.conf,/home/myuser/gobblin/conf/yarn/reference.conf
        gobblin.yarn.container.files.local=${gobblin.yarn.app.master.files.local}
        gobblin.yarn.container.memory.mbs=1024
        gobblin.yarn.lib.jars.dir=/home/myuser/gobblin/lib
        gobblin.yarn.job.conf.path=/home/myuser/gobblin/config/yarn
        gobblin.yarn.logs.sink.root.dir=/home/myuser/gobblin/logs
        gobblin.yarn.work.dir=/user/myuser/work-yarn
        ```

      reference.conf

      ```

      1. Sample configuration properties with default values
      1. Yarn/Helix configuration properties
        gobblin.yarn.app.queue=default
        gobblin.yarn.helix.cluster.name=GobblinYarn
        gobblin.yarn.app.name=GobblinYarn
        gobblin.yarn.app.master.memory.mbs=512
        gobblin.yarn.app.master.cores=1
        gobblin.yarn.app.report.interval.minutes=5
        gobblin.yarn.max.get.app.report.failures=4
        gobblin.yarn.email.notification.on.shutdown=false
        gobblin.yarn.initial.containers=1
        gobblin.yarn.container.memory.mbs=512
        gobblin.yarn.container.cores=1
        gobblin.yarn.container.affinity.enabled=true
        gobblin.yarn.helix.instance.max.retries=2
        gobblin.yarn.keytab.login.interval.minutes=1440
        gobblin.yarn.token.renew.interval.minutes=720
        gobblin.yarn.work.dir=/gobblin
        gobblin.yarn.zk.connection.string=localhost:2181

      fs.uri=hdfs://localhost:9000

      job.execinfo.server.enabled=false
      ```

      The following is a part of yarn application log.

      ```
      11:32:21.105 [ContainerLaunchExecutor] WARN gobblin.yarn.YarnService - Path hdfs://sinbaram01:8020/user/yarn/GobblinYarn/application_1473040117010_0115/_libjars does not exist so no container LocalResource to add
      11:32:21.106 [ContainerLaunchExecutor] WARN gobblin.yarn.YarnService - Path hdfs://sinbaram01:8020/user/yarn/GobblinYarn/application_1473040117010_0115/container/_appjars does not exist so no container LocalResource to add
      11:32:21.107 [ContainerLaunchExecutor] WARN gobblin.yarn.YarnService - Path hdfs://sinbaram01:8020/user/yarn/GobblinYarn/application_1473040117010_0115/container/_appfiles does not exist so no container LocalResource to add
      2016-09-12 11:32:21 KST INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor - Processing Event EventType: START_CONTAINER for Container container_e21_1473040117010_0115_01_000002
      2016-09-12 11:32:21 KST INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData - Opening proxy : sinbaram03:45454
      11:32:21.154 org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 INFO gobblin.yarn.YarnService - Container container_e21_1473040117010_0115_01_000002 has been started
      11:32:21.697 [main-SendThread(sinbaram01:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x156f805c81f001e, packet:: clientPath:null serverPath:null finished:false header:: 237,4 replyHeader:: 237,128849021529,0 request:: '/GobblinYarn/PROPERTYSTORE/TaskRebalancer/Test-1.5.8-Yarn-002/Context,F response:: #7ba2020226964223a22576f726b666c6f77436f6e7465787422a20202c2273696d706c654669656c6473223a7ba202020202253544152545f54494d45223a223134373336343735333937323022a202020202c225354415445223a22494e5f50524f475245535322a20207da20202c226c6973744669656c6473223a7ba20207da20202c226d61704669656c6473223a7ba20202020224a4f425f535441544553223a7ba20202020202022546573742d312e352e382d5961726e2d3030325f6a6f625f546573742d312e352e382d5961726e2d3030325f31343733363437353338393931223a22494e5f50524f475245535322a202020207da20207da7d,s

      {128849021527,128849021527,1473647539853,1473647539853,0,0,0,0,260,0,128849021527}
      11:32:22.698 [main-SendThread(sinbaram01:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x156f805c81f001e, packet:: clientPath:null serverPath:null finished:false header:: 238,4 replyHeader:: 238,128849021529,0 request:: '/GobblinYarn/PROPERTYSTORE/TaskRebalancer/Test-1.5.8-Yarn-002/Context,F response:: #7ba2020226964223a22576f726b666c6f77436f6e7465787422a20202c2273696d706c654669656c6473223a7ba202020202253544152545f54494d45223a223134373336343735333937323022a202020202c225354415445223a22494e5f50524f475245535322a20207da20202c226c6973744669656c6473223a7ba20207da20202c226d61704669656c6473223a7ba20202020224a4f425f535441544553223a7ba20202020202022546573742d312e352e382d5961726e2d3030325f6a6f625f546573742d312e352e382d5961726e2d3030325f31343733363437353338393931223a22494e5f50524f475245535322a202020207da20207da7d,s{128849021527,128849021527,1473647539853,1473647539853,0,0,0,0,260,0,128849021527}


      11:32:23.106 [AMRM Callback Handler Thread] INFO gobblin.yarn.YarnService - Container container_e21_1473040117010_0115_01_000002 running Helix instance GobblinWorkUnitRunner_1 has completed with exit status 1
      11:32:23.106 [AMRM Callback Handler Thread] INFO gobblin.yarn.YarnService - Received the following diagnostics information for container container_e21_1473040117010_0115_01_000002: Exception from container-launch.
      Container id: container_e21_1473040117010_0115_01_000002
      Exit code: 1
      Stack trace: ExitCodeException exitCode=1:
      at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
      at org.apache.hadoop.util.Shell.run(Shell.java:487)
      at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
      at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      Container exited with a non-zero exit code 1

      11:32:23.108 [AMRM Callback Handler Thread] ERROR gobblin.yarn.YarnService - Received error: java.lang.NullPointerException
      java.lang.NullPointerException: null
      at gobblin.yarn.YarnService.handleContainerCompletion(YarnService.java:463)
      at gobblin.yarn.YarnService.access$200(YarnService.java:93)
      at gobblin.yarn.YarnService$AMRMClientCallbackHandler.onContainersCompleted(YarnService.java:527)
      at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
      11:32:23.109 [AMRM Callback Handler Thread] INFO gobblin.yarn.GobblinApplicationMaster - Stopping the Gobblin Yarn ApplicationMaster
      ```

      Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-246229128


      chosh0615 wrote on 2016-09-13T06:50:32Z : This is log of the container on which a task ran.

      ```
      LogType:GobblinWorkUnitRunner.stderr
      Log Upload Time:Tue Sep 06 15:12:03 +0900 2016
      LogLength:106
      Log Contents:
      Error: Cannot find or load class gobblin.yarn.GobblinWorkUnitRunner.
      End of LogType:GobblinWorkUnitRunner.stderr

      LogType:GobblinWorkUnitRunner.stdout
      Log Upload Time:Tue Sep 06 15:12:03 +0900 2016
      LogLength:0
      Log Contents:
      End of LogType:GobblinWorkUnitRunner.stdout
      ```

      Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-246591162


      jsavolainen wrote on 2016-09-13T12:29:32Z : Quick fix to this issue is to set HADOOP_USER_NAME environment variable to yarn (or whichever user Yarn runs on).

      I guess the container services in GobblinApplicationMaster and GobblinYarnTaskRunner should be started inside UserGroupInformation.doAs with an user obtained from USER environment variable as done in org.apache.hadoop.mapreduce.v2.app.MRAppMaster.

      Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-246665620


      chosh0615 wrote on 2016-09-14T04:31:01Z : Thanks, @jsavolainen . I will try it and let you know. (It will take few days though due to holidays)
      And I agree with your solution. (running AM with UserGroupInformation) Hopefully I will try this as well.

      Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-246903723


      chosh0615 wrote on 2016-09-30T11:40:42Z : Sorry for late update.
      @jsavolainen , it works with your quick fix. But it can only run with yarn user.
      So, as suggested before, it would be better to use UserGroupInformation.

      Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-250723993


      bit1129 wrote on 2016-10-25T03:31:45Z : @chosh0615 I hit the same issue,how do you work around the problem, did you add the HADOOP_USER_NAME as an envrionment variable? I add the export HADOOP_USER_NAME=hadoop in /etc/profile, but the problem still exists(I am using hadoop to run the YARN)

      Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-255927054


      chosh0615 wrote on 2016-10-25T13:41:53Z : @bit1129 , I added HADOOP_USER_NAME=yarn environment variable.
      The working files should appear in /user/yarn on HDFS. If you still get error, check permission of the directory.

      Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-256037720

      Attachments

        Activity

          People

            Unassigned Unassigned
            chosh0615 Sean Cho
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: