Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-135

Gobblin Lock deleted after second scheduled run

    XMLWordPrintableJSON

Details

    Description

      When the gobblin lock is enabled, and the job is scheduled to run every 15 min., this is an example of a scenario I have observed:

      00- 1st Gobblin Job starts. Lock file gets created in lock-dir
      15- 1st Gobblin Job is still running. 2nd scheduled job does not run post which the lock file gets deleted even if the 1st Job is running
      30- 1st Job is still running. 3rd Job starts since no lock file exists. Both 1st and 3rd Job fails

      Is this expected behavior?

      Github Url : https://github.com/linkedin/gobblin/issues/754
      Github Reporter : asrayousuf
      Github Created At : 2016-02-24T18:47:54Z
      Github Updated At : 2017-01-12T04:42:13Z

      Comments


      jbaranick wrote on 2016-02-24T19:00:16Z : @asrayousuf I noticed the same thing locally while working on a feature to cause gobblin jobs to skip rather then fail when they are already locked. I'll be submitting a PR today for that feature. If taken, it should address this issue.

      Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-188406413


      stakiar wrote on 2016-02-25T03:09:46Z : Hey, @asrayousuf that sounds like a bug. Thanks @kadaan for working on the PR.

      Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-188581931


      asrayousuf wrote on 2016-03-08T12:34:20Z : @kadaan The changes to the AbstractJobLauncher does seem to be working as the lock file is no longer getting deleted when a second job is started and the previous job is already running. Thanks!

      @sahilTakiar could you please look into the PR #764 and close this issue accordingly. Thanks!

      Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-193766736


      stakiar wrote on 2016-03-08T21:44:09Z : This should be fixed as #764 has been merged, but I am going to keep this open for now cause we should add a unit test to ensure the locks are working properly.

      Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-193980740


      jbaranick wrote on 2016-03-08T21:48:03Z : @sahilTakiar One thing to keep in mind is that the existing file based locks don't expire. That means if a server crashes, the lock will never be released. Internally I'm testing a Curator (zookeeper) based lock to solve this issue.

      Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-193981904


      stakiar wrote on 2016-03-09T02:43:02Z : I see, something like the [Shared ReEntrant Lock](http://curator.apache.org/curator-recipes/shared-reentrant-lock.html)?

      Based on [this](http://stackoverflow.com/questions/27113914/apache-zookeeper-curator-time-to-live-on-locks) stack overflow post seems like the locks are based on ephemeral, so if the connection dies the lock is deleted, correct?

      Do you face this problem when running Gobblin on YARN? Since Gobblin on YARN already requires ZK to run, having a `JobLock` implementation backed by ZK would be really useful.

      Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-194081623


      jbaranick wrote on 2016-03-09T02:50:31Z : @sahilTakiar Yeah, but the non-reentrant version. I do have this problem on Gobblin Yarn.

      I will be submitting a PR for this tonight. I've been running it in our pre-prod environment today and it seems good. We had a problem where the app_master died and a new app_master was spun up and began to process. With FileBasedLock, this would not have worked.

      Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-194087655


      stakiar wrote on 2016-03-09T02:57:04Z : Awesome! yeah, I like the ZK lock a lot better

      Github Url : https://github.com/linkedin/gobblin/issues/754#issuecomment-194090255

      Attachments

        Activity

          People

            Unassigned Unassigned
            abti Abhishek Tiwari
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: