Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Hi, I'm trying to run a gobblin-yarn job and found some weird behaviors.
First of all, although I set the property gobblin.yarn.work.dir, Gobblin does not use this property to store anything. Instead, the GobblinYarnAppLauncher creates and stores files in _libjars, appmaster, container directories under `'/user/<userid>/<jobName>/<jobId>'` directory, whereas GobblinHelixJobLauncher uses `'/user/yarn/<jobName>/<jobId>'` as working directory. As a result, GobblinHelixJobLauncher cannot find dependency files.
I think this is because `GobblinClusterUtils.getAppWorkDirPath` returns `/user/$USER/<appName>/<appId>` as working directory. Maybe it should use `UserGroupInformation` when accessing the resources.
Github Url : https://github.com/linkedin/gobblin/issues/1258
Github Reporter : chosh0615
Github Created At : 2016-09-08T13:45:32Z
Github Updated At : 2017-01-12T05:09:24Z
Comments
abti wrote on 2016-09-08T22:52:21Z : I just worked it backwards in the code, and it doesn't seem that this
property is being honored. I think it got lost in refactoring efforts.
However, the paths being used by GobblinYarnAppLauncher and
GobblinHelixJobLauncher (via GobblinApplicationMaster) are unrelated, so
should not cause any failure. Can you please share your configs and log
files?
Thanks
Abhishek
On Thu, Sep 8, 2016 at 7:15 PM, Sean notifications@github.com wrote:
> Hi, I'm trying to run a job with gobblin-yarn.
> I found some weird behaviors.
> First of all, although I set the property gobblin.yarn.work.dir, Gobblin
> does not use this property to store anything. Instead, the
> GobblinYarnAppLauncher creates and store files in _libjars, appmaster,
> container directories under '/user/<userid>/<jobName>/<jobId>' directory,
> whereas GobblinHelixJobLauncher uses '/user/yarn/<jobName>/<jobId>' as
> working directory. As a result, GobblinHelixJobLauncher cannot find
> dependency files.
> I think this is because YarnHelixUtils.getAppWorkDirPath returns
> /user/$USER/<appName>/<appId> as working directory.
>
> Is there anything to set up to run gobblin-yarn by specific user other
> then yarn?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> https://github.com/linkedin/gobblin/issues/1258, or mute the thread
> https://github.com/notifications/unsubscribe-auth/AAEPe4ZXh97Rgww8P47po2K_6ehq5s2sks5qoBF8gaJpZM4J4Ah-
> .
Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-245766591
chosh0615 wrote on 2016-09-12T02:33:37Z : In my observation, GobblinYarnAppLauncher.addAppMasterLocalResources method adds jar files to /user/myuser, but YarnService.newContainerLaunchContext add resources from /user/yarn directory.
The followings are my configurations.
application.conf
```
- Yarn/Helix configuration properties
gobblin.yarn.helix.cluster.name=GobblinYarn
gobblin.yarn.app.queue=default
gobblin.yarn.app.name=GobblinYarn
gobblin.yarn.zk.connection.string=myhost01:2181,myhost02:2181,myhost03:2181
gobblin.yarn.app.master.memory.mbs=512
gobblin.yarn.initial.containers=1
gobblin.yarn.app.master.files.local=/home/myuser/gobblin/conf/yarn/log4j-yarn.properties,/home/myuser/gobblin/conf/yarn/application.conf,/home/myuser/gobblin/conf/yarn/reference.conf
gobblin.yarn.container.files.local=${gobblin.yarn.app.master.files.local}
gobblin.yarn.container.memory.mbs=1024
gobblin.yarn.lib.jars.dir=/home/myuser/gobblin/lib
gobblin.yarn.job.conf.path=/home/myuser/gobblin/config/yarn
gobblin.yarn.logs.sink.root.dir=/home/myuser/gobblin/logs
gobblin.yarn.work.dir=/user/myuser/work-yarn
```
reference.conf
```
- Sample configuration properties with default values
- Yarn/Helix configuration properties
gobblin.yarn.app.queue=default
gobblin.yarn.helix.cluster.name=GobblinYarn
gobblin.yarn.app.name=GobblinYarn
gobblin.yarn.app.master.memory.mbs=512
gobblin.yarn.app.master.cores=1
gobblin.yarn.app.report.interval.minutes=5
gobblin.yarn.max.get.app.report.failures=4
gobblin.yarn.email.notification.on.shutdown=false
gobblin.yarn.initial.containers=1
gobblin.yarn.container.memory.mbs=512
gobblin.yarn.container.cores=1
gobblin.yarn.container.affinity.enabled=true
gobblin.yarn.helix.instance.max.retries=2
gobblin.yarn.keytab.login.interval.minutes=1440
gobblin.yarn.token.renew.interval.minutes=720
gobblin.yarn.work.dir=/gobblin
gobblin.yarn.zk.connection.string=localhost:2181
fs.uri=hdfs://localhost:9000
job.execinfo.server.enabled=false
```
The following is a part of yarn application log.
```
11:32:21.105 [ContainerLaunchExecutor] WARN gobblin.yarn.YarnService - Path hdfs://sinbaram01:8020/user/yarn/GobblinYarn/application_1473040117010_0115/_libjars does not exist so no container LocalResource to add
11:32:21.106 [ContainerLaunchExecutor] WARN gobblin.yarn.YarnService - Path hdfs://sinbaram01:8020/user/yarn/GobblinYarn/application_1473040117010_0115/container/_appjars does not exist so no container LocalResource to add
11:32:21.107 [ContainerLaunchExecutor] WARN gobblin.yarn.YarnService - Path hdfs://sinbaram01:8020/user/yarn/GobblinYarn/application_1473040117010_0115/container/_appfiles does not exist so no container LocalResource to add
2016-09-12 11:32:21 KST INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor - Processing Event EventType: START_CONTAINER for Container container_e21_1473040117010_0115_01_000002
2016-09-12 11:32:21 KST INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData - Opening proxy : sinbaram03:45454
11:32:21.154 org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 INFO gobblin.yarn.YarnService - Container container_e21_1473040117010_0115_01_000002 has been started
11:32:21.697 [main-SendThread(sinbaram01:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x156f805c81f001e, packet:: clientPath:null serverPath:null finished:false header:: 237,4 replyHeader:: 237,128849021529,0 request:: '/GobblinYarn/PROPERTYSTORE/TaskRebalancer/Test-1.5.8-Yarn-002/Context,F response:: #7ba2020226964223a22576f726b666c6f77436f6e7465787422a20202c2273696d706c654669656c6473223a7ba202020202253544152545f54494d45223a223134373336343735333937323022a202020202c225354415445223a22494e5f50524f475245535322a20207da20202c226c6973744669656c6473223a7ba20207da20202c226d61704669656c6473223a7ba20202020224a4f425f535441544553223a7ba20202020202022546573742d312e352e382d5961726e2d3030325f6a6f625f546573742d312e352e382d5961726e2d3030325f31343733363437353338393931223a22494e5f50524f475245535322a202020207da20207da7d,s
11:32:22.698 [main-SendThread(sinbaram01:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x156f805c81f001e, packet:: clientPath:null serverPath:null finished:false header:: 238,4 replyHeader:: 238,128849021529,0 request:: '/GobblinYarn/PROPERTYSTORE/TaskRebalancer/Test-1.5.8-Yarn-002/Context,F response:: #7ba2020226964223a22576f726b666c6f77436f6e7465787422a20202c2273696d706c654669656c6473223a7ba202020202253544152545f54494d45223a223134373336343735333937323022a202020202c225354415445223a22494e5f50524f475245535322a20207da20202c226c6973744669656c6473223a7ba20207da20202c226d61704669656c6473223a7ba20202020224a4f425f535441544553223a7ba20202020202022546573742d312e352e382d5961726e2d3030325f6a6f625f546573742d312e352e382d5961726e2d3030325f31343733363437353338393931223a22494e5f50524f475245535322a202020207da20207da7d,s{128849021527,128849021527,1473647539853,1473647539853,0,0,0,0,260,0,128849021527}
11:32:23.106 [AMRM Callback Handler Thread] INFO gobblin.yarn.YarnService - Container container_e21_1473040117010_0115_01_000002 running Helix instance GobblinWorkUnitRunner_1 has completed with exit status 1
11:32:23.106 [AMRM Callback Handler Thread] INFO gobblin.yarn.YarnService - Received the following diagnostics information for container container_e21_1473040117010_0115_01_000002: Exception from container-launch.
Container id: container_e21_1473040117010_0115_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
11:32:23.108 [AMRM Callback Handler Thread] ERROR gobblin.yarn.YarnService - Received error: java.lang.NullPointerException
java.lang.NullPointerException: null
at gobblin.yarn.YarnService.handleContainerCompletion(YarnService.java:463)
at gobblin.yarn.YarnService.access$200(YarnService.java:93)
at gobblin.yarn.YarnService$AMRMClientCallbackHandler.onContainersCompleted(YarnService.java:527)
at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
11:32:23.109 [AMRM Callback Handler Thread] INFO gobblin.yarn.GobblinApplicationMaster - Stopping the Gobblin Yarn ApplicationMaster
```
Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-246229128
chosh0615 wrote on 2016-09-13T06:50:32Z : This is log of the container on which a task ran.
```
LogType:GobblinWorkUnitRunner.stderr
Log Upload Time:Tue Sep 06 15:12:03 +0900 2016
LogLength:106
Log Contents:
Error: Cannot find or load class gobblin.yarn.GobblinWorkUnitRunner.
End of LogType:GobblinWorkUnitRunner.stderr
LogType:GobblinWorkUnitRunner.stdout
Log Upload Time:Tue Sep 06 15:12:03 +0900 2016
LogLength:0
Log Contents:
End of LogType:GobblinWorkUnitRunner.stdout
```
Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-246591162
jsavolainen wrote on 2016-09-13T12:29:32Z : Quick fix to this issue is to set HADOOP_USER_NAME environment variable to yarn (or whichever user Yarn runs on).
I guess the container services in GobblinApplicationMaster and GobblinYarnTaskRunner should be started inside UserGroupInformation.doAs with an user obtained from USER environment variable as done in org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-246665620
chosh0615 wrote on 2016-09-14T04:31:01Z : Thanks, @jsavolainen . I will try it and let you know. (It will take few days though due to holidays)
And I agree with your solution. (running AM with UserGroupInformation) Hopefully I will try this as well.
Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-246903723
chosh0615 wrote on 2016-09-30T11:40:42Z : Sorry for late update.
@jsavolainen , it works with your quick fix. But it can only run with yarn user.
So, as suggested before, it would be better to use UserGroupInformation.
Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-250723993
bit1129 wrote on 2016-10-25T03:31:45Z : @chosh0615 I hit the same issue,how do you work around the problem, did you add the HADOOP_USER_NAME as an envrionment variable? I add the export HADOOP_USER_NAME=hadoop in /etc/profile, but the problem still exists(I am using hadoop to run the YARN)
Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-255927054
chosh0615 wrote on 2016-10-25T13:41:53Z : @bit1129 , I added HADOOP_USER_NAME=yarn environment variable.
The working files should appear in /user/yarn on HDFS. If you still get error, check permission of the directory.
Github Url : https://github.com/linkedin/gobblin/issues/1258#issuecomment-256037720