Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When the gobblin lock is enabled, and the job is scheduled to run every 15 min., this is an example of a scenario I have observed:
00- 1st Gobblin Job starts. Lock file gets created in lock-dir
15- 1st Gobblin Job is still running. 2nd scheduled job does not run post which the lock file gets deleted even if the 1st Job is running
30- 1st Job is still running. 3rd Job starts since no lock file exists. Both 1st and 3rd Job fails
Is this expected behavior?
Github Url : https://github.com/linkedin/gobblin/issues/754
Github Reporter : asrayousuf
Github Created At : 2016-02-24T18:47:54Z
Github Updated At : 2017-01-12T04:42:13Z
Comments
jbaranick wrote on 2016-02-24T19:00:16Z : @asrayousuf I noticed the same thing locally while working on a feature to cause gobblin jobs to skip rather then fail when they are already locked. I'll be submitting a PR today for that feature. If taken, it should address this issue.
Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-188406413
stakiar wrote on 2016-02-25T03:09:46Z : Hey, @asrayousuf that sounds like a bug. Thanks @kadaan for working on the PR.
Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-188581931
asrayousuf wrote on 2016-03-08T12:34:20Z : @kadaan The changes to the AbstractJobLauncher does seem to be working as the lock file is no longer getting deleted when a second job is started and the previous job is already running. Thanks!
@sahilTakiar could you please look into the PR #764 and close this issue accordingly. Thanks!
Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-193766736
stakiar wrote on 2016-03-08T21:44:09Z : This should be fixed as #764 has been merged, but I am going to keep this open for now cause we should add a unit test to ensure the locks are working properly.
Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-193980740
jbaranick wrote on 2016-03-08T21:48:03Z : @sahilTakiar One thing to keep in mind is that the existing file based locks don't expire. That means if a server crashes, the lock will never be released. Internally I'm testing a Curator (zookeeper) based lock to solve this issue.
Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-193981904
stakiar wrote on 2016-03-09T02:43:02Z : I see, something like the [Shared ReEntrant Lock](http://curator.apache.org/curator-recipes/shared-reentrant-lock.html)?
Based on [this](http://stackoverflow.com/questions/27113914/apache-zookeeper-curator-time-to-live-on-locks) stack overflow post seems like the locks are based on ephemeral, so if the connection dies the lock is deleted, correct?
Do you face this problem when running Gobblin on YARN? Since Gobblin on YARN already requires ZK to run, having a `JobLock` implementation backed by ZK would be really useful.
Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-194081623
jbaranick wrote on 2016-03-09T02:50:31Z : @sahilTakiar Yeah, but the non-reentrant version. I do have this problem on Gobblin Yarn.
I will be submitting a PR for this tonight. I've been running it in our pre-prod environment today and it seems good. We had a problem where the app_master died and a new app_master was spun up and began to process. With FileBasedLock, this would not have worked.
Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-194087655
stakiar wrote on 2016-03-09T02:57:04Z : Awesome! yeah, I like the ZK lock a lot better
Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-194090255