Oozie
  1. Oozie
  2. OOZIE-1306

add flexibility to oozie coordinator job scheduling

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1.0
    • Component/s: None
    • Labels:
      None

      Description

      Right now, to have a job that runs everyday from 9-5pm, we need to schedule 8 identical daily coordinator job, each of which starts at one particular hour; To have a job that runs from Monday through Friday, we need 5 identical weekly jobs with different start time. This ticket aims to solve this inconvenience.

      1. ooziecronschedulingdesigndoc.pdf
        38 kB
        Bowen Zhang
      2. oozie-1306-svn.patch
        26 kB
        Bowen Zhang
      3. oozie-1306-6.patch
        40 kB
        Bowen Zhang
      4. oozie-1306-5.patch
        40 kB
        Bowen Zhang
      5. oozie-1306.patch
        27 kB
        Bowen Zhang
      6. oozie-1306.patch
        33 kB
        Bowen Zhang
      7. oozie-1306.patch
        40 kB
        Bowen Zhang

        Activity

        Hide
        Alejandro Abdelnur added a comment -

        This could be done by adding support for 'crontab' like syntax to express job frequency.

        Though, we have to think how this will affect/impact dataset instances ranges used as input/output.

        For example:

        A coordinator job running weekdays, MON-FRI will produce datasets MON-FRI.

        When a second coordinator job consumes the dataset of the first MON-FRI coordinator job, how it express it dependencies assuming this second coordinator runs daily or weekly and wants to consume a 'range' of the output of the previous coordinator job.

        It would be good to work on paper some uses cases to see how this would work and what else needs to be done so this new feature is usable and integrates with existing coordinator jobs.

        Show
        Alejandro Abdelnur added a comment - This could be done by adding support for 'crontab' like syntax to express job frequency. Though, we have to think how this will affect/impact dataset instances ranges used as input/output. For example: A coordinator job running weekdays, MON-FRI will produce datasets MON-FRI. When a second coordinator job consumes the dataset of the first MON-FRI coordinator job, how it express it dependencies assuming this second coordinator runs daily or weekly and wants to consume a 'range' of the output of the previous coordinator job. It would be good to work on paper some uses cases to see how this would work and what else needs to be done so this new feature is usable and integrates with existing coordinator jobs.
        Hide
        Mona Chitnis added a comment -

        I would like to see more concrete use-cases here too. Is this enhancement request only for flexible frequency that automatically kicks off for different start-times meaningful to that frequency (each hour for daily job, each day for weekly job ...)? Or is there a use-case for starting a coordinator at always a 'predefined' start-time, e.g. start-of-the-hour for hourly job, first day of week for weekly, first day of month for monthly and so on?

        Show
        Mona Chitnis added a comment - I would like to see more concrete use-cases here too. Is this enhancement request only for flexible frequency that automatically kicks off for different start-times meaningful to that frequency (each hour for daily job, each day for weekly job ...)? Or is there a use-case for starting a coordinator at always a 'predefined' start-time, e.g. start-of-the-hour for hourly job, first day of week for weekly, first day of month for monthly and so on?
        Hide
        Robert Kanter added a comment -

        I talked with Alejandro offline and at a high level, we should be able to support cron-like scheduling using crontab syntax. We may have to do some refactoring, but it should be possible to modify the EL functions (e.g. current(-1)) to handle cron frequencies as we're typically resolving frequencies to specific datetimes and using those anyway. So, we'd have a new time unit called "CRON" and the user would specify the crontab syntax as the frequency. We'd have to create a function that can parse crontab syntax, and some object that can handle the calculations for cron stuff. The EL functions currently just directly use the frequency and do some math, but if we do some abstraction here, we can make a version that handles cron, handles days, a version that handles months, etc.

        If we want to do this, or even something similar, we'll need to change the datatype of the frequency column in the database to a varchar instead of an int to be able to store the crontab in there; in anticipation of this, perhaps we should make this database change in Oozie 4.0 while we can, even if we don't have the cron stuff ready yet?

        Show
        Robert Kanter added a comment - I talked with Alejandro offline and at a high level, we should be able to support cron-like scheduling using crontab syntax. We may have to do some refactoring, but it should be possible to modify the EL functions (e.g. current(-1) ) to handle cron frequencies as we're typically resolving frequencies to specific datetimes and using those anyway. So, we'd have a new time unit called "CRON" and the user would specify the crontab syntax as the frequency. We'd have to create a function that can parse crontab syntax, and some object that can handle the calculations for cron stuff. The EL functions currently just directly use the frequency and do some math, but if we do some abstraction here, we can make a version that handles cron, handles days, a version that handles months, etc. If we want to do this, or even something similar, we'll need to change the datatype of the frequency column in the database to a varchar instead of an int to be able to store the crontab in there; in anticipation of this, perhaps we should make this database change in Oozie 4.0 while we can, even if we don't have the cron stuff ready yet?
        Hide
        Robert Kanter added a comment -

        Spoke with Rohini and Virag and Alejandro offline, and making the db change is fine once some other db column changes are in; I'll create a JIRA subtask.

        Show
        Robert Kanter added a comment - Spoke with Rohini and Virag and Alejandro offline, and making the db change is fine once some other db column changes are in; I'll create a JIRA subtask.
        Hide
        Robert Kanter added a comment -

        It looks like Quartz Scheduler has (among other things), some utilities for parsing and calculating cron expressions, and is Apache Licensed. Instead of re-inventing the wheel, it may make sense to use for handling some of the cron stuff:
        http://www.quartz-scheduler.org/api/2.1.7/org/quartz/CronExpression.html

        Show
        Robert Kanter added a comment - It looks like Quartz Scheduler has (among other things), some utilities for parsing and calculating cron expressions, and is Apache Licensed. Instead of re-inventing the wheel, it may make sense to use for handling some of the cron stuff: http://www.quartz-scheduler.org/api/2.1.7/org/quartz/CronExpression.html
        Hide
        Bowen Zhang added a comment -

        We explored around the Quartz Scheduler a bit. Will borrow some ideas from it.

        Show
        Bowen Zhang added a comment - We explored around the Quartz Scheduler a bit. Will borrow some ideas from it.
        Hide
        Robert Kanter added a comment -

        Bowen Zhang, it sounds like you've put more thought into this than I have so far; would you be able to explain what you've done/are doing? Does what you're thinking match up with what I posted a few comments up?

        Show
        Robert Kanter added a comment - Bowen Zhang , it sounds like you've put more thought into this than I have so far; would you be able to explain what you've done/are doing? Does what you're thinking match up with what I posted a few comments up?
        Hide
        Bowen Zhang added a comment -

        Robert Kanter, in Quartz, it has a separation between the concept of job and trigger while in oozie we don't really have a concept of trigger except using frequency for simple computation. So, you are right about creating a abstraction of computing the materializations for simple frequency and crontabs. And under this abstraction, the materialization of crontabs can be inspired by the concept of trigger in Quartz. What's your thought?

        Show
        Bowen Zhang added a comment - Robert Kanter , in Quartz, it has a separation between the concept of job and trigger while in oozie we don't really have a concept of trigger except using frequency for simple computation. So, you are right about creating a abstraction of computing the materializations for simple frequency and crontabs. And under this abstraction, the materialization of crontabs can be inspired by the concept of trigger in Quartz. What's your thought?
        Hide
        Bowen Zhang added a comment -

        Mona Chitnis, today if I want to create a coordinator job that runs from 9-5pm per hour from Mon-Friday, you need to create 5*8 =40 different coord jobs.

        Show
        Bowen Zhang added a comment - Mona Chitnis , today if I want to create a coordinator job that runs from 9-5pm per hour from Mon-Friday, you need to create 5*8 =40 different coord jobs.
        Hide
        Mona Chitnis added a comment -

        Bowen Zhang Got that point. My question was around a slightly different use case which had come up. Which was to "reset" to a certain start time, when start times are obtained programmatically from previous stages and may not align to the start of the hour, day, week or month. Has anyone else encountered this use-case or thought of current workarounds?

        Show
        Mona Chitnis added a comment - Bowen Zhang Got that point. My question was around a slightly different use case which had come up. Which was to "reset" to a certain start time, when start times are obtained programmatically from previous stages and may not align to the start of the hour, day, week or month. Has anyone else encountered this use-case or thought of current workarounds?
        Hide
        Robert Kanter added a comment -

        I believe we already have EL functions for (at least some) things like "first day of the month". The use cases we usually see are users wanting things like "every monday and tuesday at 3pm and 5pm", "every hour on Saturday and Sunday", etc.

        Show
        Robert Kanter added a comment - I believe we already have EL functions for (at least some) things like "first day of the month". The use cases we usually see are users wanting things like "every monday and tuesday at 3pm and 5pm", "every hour on Saturday and Sunday", etc.
        Hide
        Bowen Zhang added a comment -

        Robert Kanter, you are absolutely right, I miss that part.

        Show
        Bowen Zhang added a comment - Robert Kanter , you are absolutely right, I miss that part.
        Hide
        Robert Kanter added a comment -

        Comments on the patch:

        • CoordCommandUtils
          • Extra newline introduced on line 66
          • getNextValidActionTime(...)
            • freq is never used; no need to assign a variable there on line 638
            • We typically try to have one return statement in a method for readability, so instead of returning targetDate in the try block and nextTime in the catch block (and nothing in the function itself), can you rework it to have a single return statement at the end of the method?
            • Can you add some comments? I've been able to figure out some things on my own by looking at the Javadoc here, but it would be helpful in understanding the special cases you are handling with the * and ? stuff.
            • Add Javadoc
          • getNextActionMeasureTime
            • Add Javadoc
        • CoordJobsGetRunningPastEndtimeJPAExecutor
          • Add Javadoc
          • Missing license header
          • Shouldn't this take a Date like CoordActionsGetByLastModifiedTimeJPAExecutor does? When the latter is used in StatusTransitService, it uses lastInstanceStartTime to only look for the actions that have been updated since the last time the StatusTransitService ran. Using System.currentTimeMillis() in the CoordJobsGetRunningPastEndtimeJPAExecutor is probably not the same behavior.
        • StatusTransitService
          • Why did you remove the debug statement around line 665?
          • coordTransit()
            • You can replace the for loop that adds the coord ids in the returned List to the Set with the simpler coordIds.addAll(coordJobRunningPastEndtimeIdList);. Same with the previous for loop.
        • TestCoordCommandUtils
          • Unused import of DateFormat
          • testGetNextValidActionTime()
            • This is a good start, but can you add more cases? It is important to make sure that getNextValidActionTime is working properly
        • TestCoordMaterializeTransitionXCommand
          • Some of the lines are longer than 132 characters
          • As before, can you add more cases?
        • TestCoordJobsGetRunningPastEndtimeJPAExecutor
          • Add Javadoc
          • Missing license header
          • Unused import of Job
          • Can you add some more cases here as well? like some where there are other jobs that exit which shouldn't be returned by the JPA command
        • Have you done any manual testing? (you should be able to play with the computer clock to "trick" Oozie)
        • The Javadoc on CronExpression here mentions that ranges can "overflow" (e.g. 22-2) and that you can make nonsensical ranges. Can you check what would happen if the user tries that in Oozie? (you know that some user will try this )
        Show
        Robert Kanter added a comment - Comments on the patch: CoordCommandUtils Extra newline introduced on line 66 getNextValidActionTime(...) freq is never used; no need to assign a variable there on line 638 We typically try to have one return statement in a method for readability, so instead of returning targetDate in the try block and nextTime in the catch block (and nothing in the function itself), can you rework it to have a single return statement at the end of the method? Can you add some comments? I've been able to figure out some things on my own by looking at the Javadoc here , but it would be helpful in understanding the special cases you are handling with the * and ? stuff. Add Javadoc getNextActionMeasureTime Add Javadoc CoordJobsGetRunningPastEndtimeJPAExecutor Add Javadoc Missing license header Shouldn't this take a Date like CoordActionsGetByLastModifiedTimeJPAExecutor does? When the latter is used in StatusTransitService, it uses lastInstanceStartTime to only look for the actions that have been updated since the last time the StatusTransitService ran. Using System.currentTimeMillis() in the CoordJobsGetRunningPastEndtimeJPAExecutor is probably not the same behavior. StatusTransitService Why did you remove the debug statement around line 665? coordTransit() You can replace the for loop that adds the coord ids in the returned List to the Set with the simpler coordIds.addAll(coordJobRunningPastEndtimeIdList); . Same with the previous for loop. TestCoordCommandUtils Unused import of DateFormat testGetNextValidActionTime() This is a good start, but can you add more cases? It is important to make sure that getNextValidActionTime is working properly TestCoordMaterializeTransitionXCommand Some of the lines are longer than 132 characters As before, can you add more cases? TestCoordJobsGetRunningPastEndtimeJPAExecutor Add Javadoc Missing license header Unused import of Job Can you add some more cases here as well? like some where there are other jobs that exit which shouldn't be returned by the JPA command Have you done any manual testing? (you should be able to play with the computer clock to "trick" Oozie) The Javadoc on CronExpression here mentions that ranges can "overflow" (e.g. 22-2) and that you can make nonsensical ranges. Can you check what would happen if the user tries that in Oozie? (you know that some user will try this )
        Hide
        Rohini Palaniswamy added a comment -

        Can we have the patch in review board?

        Show
        Rohini Palaniswamy added a comment - Can we have the patch in review board?
        Hide
        Bowen Zhang added a comment -

        Robert Kanter, for CoordJobsGetRunningPastEndtimeJPAExecutor, we do want the current time since we are checking for all the jobs that are running past their end time at current time. And for the overflow issue, the Quartz module will pick one possible schedule should ambiguity arise. So, the users are confusing themselves in that case.

        Show
        Bowen Zhang added a comment - Robert Kanter , for CoordJobsGetRunningPastEndtimeJPAExecutor, we do want the current time since we are checking for all the jobs that are running past their end time at current time. And for the overflow issue, the Quartz module will pick one possible schedule should ambiguity arise. So, the users are confusing themselves in that case.
        Hide
        Robert Kanter added a comment -
        Show
        Robert Kanter added a comment - RB: https://reviews.apache.org/r/13362/
        Hide
        Robert Kanter added a comment -

        LGTM +1 pending Jenkins

        Rohini Palaniswamy, do you have any additional comments?

        Show
        Robert Kanter added a comment - LGTM +1 pending Jenkins Rohini Palaniswamy , do you have any additional comments?
        Hide
        Robert Kanter added a comment -

        Bowen Zhang, can you upload the latest patch to the JIRA and "submit" it so Jenkins will run it?

        Show
        Robert Kanter added a comment - Bowen Zhang , can you upload the latest patch to the JIRA and "submit" it so Jenkins will run it?
        Hide
        Bowen Zhang added a comment -

        Just did it.

        Show
        Bowen Zhang added a comment - Just did it.
        Hide
        Rohini Palaniswamy added a comment -

        Tied up at the moment. I will take a look by EOD today.

        Show
        Rohini Palaniswamy added a comment - Tied up at the moment. I will take a look by EOD today.
        Hide
        Bowen Zhang added a comment -

        Rohini Palaniswamy, made the minor change.

        Show
        Bowen Zhang added a comment - Rohini Palaniswamy , made the minor change.
        Hide
        Rohini Palaniswamy added a comment -

        +1 Pending jenkins. The patch went to the windows host again. Robert, if you can ask Tucu or Bowen if you can ask Giri and restrict it from running there it would be nice.

        Meanwhile Bowen can you try reattaching the patch again. Please number your patches so that it is easy to identify the latest one without looking at modified time.

        Show
        Rohini Palaniswamy added a comment - +1 Pending jenkins. The patch went to the windows host again. Robert, if you can ask Tucu or Bowen if you can ask Giri and restrict it from running there it would be nice. Meanwhile Bowen can you try reattaching the patch again. Please number your patches so that it is easy to identify the latest one without looking at modified time.
        Hide
        Robert Kanter added a comment -

        I thought we had already restricted the build from the windows host? I'll ask Alejandro again though.

        Show
        Robert Kanter added a comment - I thought we had already restricted the build from the windows host? I'll ask Alejandro again though.
        Hide
        Bowen Zhang added a comment -

        reattached a numbered patch.

        Show
        Bowen Zhang added a comment - reattached a numbered patch.
        Hide
        Hadoop QA added a comment -

        Testing JIRA OOZIE-1306

        Cleaning local svn workspace

        ----------------------------

        +1 PATCH_APPLIES
        +1 CLEAN
        -1 RAW_PATCH_ANALYSIS
        . +1 the patch does not introduce any @author tags
        . +1 the patch does not introduce any tabs
        . +1 the patch does not introduce any trailing spaces
        . -1 the patch contains 1 line(s) longer than 132 characters
        . +1 the patch does adds/modifies 2 testcase(s)
        +1 RAT
        . +1 the patch does not seem to introduce new RAT warnings
        +1 JAVADOC
        . +1 the patch does not seem to introduce new Javadoc warnings
        +1 COMPILE
        . +1 HEAD compiles
        . +1 patch compiles
        . +1 the patch does not seem to introduce new javac warnings
        +1 BACKWARDS_COMPATIBILITY
        . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
        . +1 the patch does not modify JPA files
        +1 TESTS
        . Tests run: 1344
        +1 DISTRO
        . +1 distro tarball builds with the patch

        ----------------------------
        -1 Overall result, please check the reported -1(s)

        The full output of the test-patch run is available at

        . https://builds.apache.org/job/oozie-trunk-precommit-build/816/

        Show
        Hadoop QA added a comment - Testing JIRA OOZIE-1306 Cleaning local svn workspace ---------------------------- +1 PATCH_APPLIES +1 CLEAN -1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . -1 the patch contains 1 line(s) longer than 132 characters . +1 the patch does adds/modifies 2 testcase(s) +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings +1 COMPILE . +1 HEAD compiles . +1 patch compiles . +1 the patch does not seem to introduce new javac warnings +1 BACKWARDS_COMPATIBILITY . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . +1 the patch does not modify JPA files +1 TESTS . Tests run: 1344 +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- -1 Overall result, please check the reported -1(s) The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/816/
        Hide
        Bowen Zhang added a comment -

        Shorten the line.

        Show
        Bowen Zhang added a comment - Shorten the line.
        Hide
        Hadoop QA added a comment -

        Testing JIRA OOZIE-1306

        Cleaning local svn workspace

        ----------------------------

        +1 PATCH_APPLIES
        +1 CLEAN
        +1 RAW_PATCH_ANALYSIS
        . +1 the patch does not introduce any @author tags
        . +1 the patch does not introduce any tabs
        . +1 the patch does not introduce any trailing spaces
        . +1 the patch does not introduce any line longer than 132
        . +1 the patch does adds/modifies 2 testcase(s)
        +1 RAT
        . +1 the patch does not seem to introduce new RAT warnings
        +1 JAVADOC
        . +1 the patch does not seem to introduce new Javadoc warnings
        +1 COMPILE
        . +1 HEAD compiles
        . +1 patch compiles
        . +1 the patch does not seem to introduce new javac warnings
        +1 BACKWARDS_COMPATIBILITY
        . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
        . +1 the patch does not modify JPA files
        +1 TESTS
        . Tests run: 1344
        +1 DISTRO
        . +1 distro tarball builds with the patch

        ----------------------------
        +1 Overall result, good!, no -1s

        The full output of the test-patch run is available at

        . https://builds.apache.org/job/oozie-trunk-precommit-build/818/

        Show
        Hadoop QA added a comment - Testing JIRA OOZIE-1306 Cleaning local svn workspace ---------------------------- +1 PATCH_APPLIES +1 CLEAN +1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . +1 the patch does not introduce any line longer than 132 . +1 the patch does adds/modifies 2 testcase(s) +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings +1 COMPILE . +1 HEAD compiles . +1 patch compiles . +1 the patch does not seem to introduce new javac warnings +1 BACKWARDS_COMPATIBILITY . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . +1 the patch does not modify JPA files +1 TESTS . Tests run: 1344 +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- +1 Overall result, good!, no -1s The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/818/
        Hide
        Rohini Palaniswamy added a comment -

        Committed to trunk. Thanks Bowen for adding this feature.

        Show
        Rohini Palaniswamy added a comment - Committed to trunk. Thanks Bowen for adding this feature.

          People

          • Assignee:
            Bowen Zhang
            Reporter:
            Bowen Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development