Uploaded image for project: 'Apache Apex Core'
  1. Apache Apex Core
  2. APEXCORE-498

Named Checkpoints - Checkpoint the DAG with a name/tag and start the app from that point

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Named Checkpoints

      1. Ability to tag/name the checkpoints
      2. On demand - checkpoint the DAG
      3. Start the app from the named checkpoints

      All checkpoints that happened before the committed window is deleted but the named checkpoints won't be deleted.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user sandeshh opened a pull request:

          https://github.com/apache/apex-core/pull/369

          Review only APEXCORE-498 Savepoint feature

          This enables the following feature,

          a. Store the savepoint of the running app in the location provided
          apexCli> savepoint <AppId> <Location_to_Store> [-overWrite]

          b. apexCli> launch -savePoint <Folder_containing_savepoint>

          Limitations:
          -> Custom Storage Agent used for checkpoints is not supported, ( v2 may be ? )
          -> Existing recovery window is used in the savepoint, so taking savepoint is not possible before the 1st recovery window

          TODO: Unit Tests

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/sandeshh/apex-core APEXCORE-498

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/apex-core/pull/369.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #369


          commit cc01813c1d103c3f9471018b67731bd8163aa18f
          Author: sandeshh <sandesh.hegde@gmail.com>
          Date: 2016-08-15T06:20:12Z

          APEXCORE-498


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user sandeshh opened a pull request: https://github.com/apache/apex-core/pull/369 Review only APEXCORE-498 Savepoint feature This enables the following feature, a. Store the savepoint of the running app in the location provided apexCli> savepoint <AppId> <Location_to_Store> [-overWrite] b. apexCli> launch -savePoint <Folder_containing_savepoint> Limitations: -> Custom Storage Agent used for checkpoints is not supported, ( v2 may be ? ) -> Existing recovery window is used in the savepoint, so taking savepoint is not possible before the 1st recovery window TODO: Unit Tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandeshh/apex-core APEXCORE-498 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-core/pull/369.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #369 commit cc01813c1d103c3f9471018b67731bd8163aa18f Author: sandeshh <sandesh.hegde@gmail.com> Date: 2016-08-15T06:20:12Z APEXCORE-498
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user sandeshh closed the pull request at:

          https://github.com/apache/apex-core/pull/369

          Show
          githubbot ASF GitHub Bot added a comment - Github user sandeshh closed the pull request at: https://github.com/apache/apex-core/pull/369
          Hide
          thw Thomas Weise added a comment -

          Sandesh this is a good feature to have, it enables reprocessing. Can you outline how it will work, without getting into code details for now. Specifically, there are different areas where state is tracked (master, buffer servers, operator state, external systems). How will this play with all of those? Please consider end-to-end exactly once and see if and how the proposed names checkpoints (also referred to as safepoints?) will fit in?

          Show
          thw Thomas Weise added a comment - Sandesh this is a good feature to have, it enables reprocessing. Can you outline how it will work, without getting into code details for now. Specifically, there are different areas where state is tracked (master, buffer servers, operator state, external systems). How will this play with all of those? Please consider end-to-end exactly once and see if and how the proposed names checkpoints (also referred to as safepoints?) will fit in?
          Hide
          thw Thomas Weise added a comment -

          Any update on this?

          Show
          thw Thomas Weise added a comment - Any update on this?
          Hide
          sandesh Sandesh added a comment -

          Discussed this with David Yan & Pramod Immaneni. There was a concern that to use this feature properly a lot of effort is required beyond this feature. Existing Operators that use WindowDataManager won't work properly with this feature, they have to be made SavePoint compatible.

          What is your take on this Thomas Weise ?

          Show
          sandesh Sandesh added a comment - Discussed this with David Yan & Pramod Immaneni . There was a concern that to use this feature properly a lot of effort is required beyond this feature. Existing Operators that use WindowDataManager won't work properly with this feature, they have to be made SavePoint compatible. What is your take on this Thomas Weise ?
          Hide
          thw Thomas Weise added a comment -

          I think this is an important feature that should be supported. It needs to be designed properly, so I was actually expecting a design level discussion that is visible to everyone. There will of course be changes needed to existing components and there will also be limitations that pertain to external systems that need to be understood and discussed.

          Why should operators that use WindowDataManager not work?

          Show
          thw Thomas Weise added a comment - I think this is an important feature that should be supported. It needs to be designed properly, so I was actually expecting a design level discussion that is visible to everyone. There will of course be changes needed to existing components and there will also be limitations that pertain to external systems that need to be understood and discussed. Why should operators that use WindowDataManager not work?
          Hide
          sanjaypujare Sanjay M Pujare added a comment -

          If a user decides to use savepoints (named checkpoints), the platform cannot guarantee end-to-end exactly once or should not be expected to guarantee, isn't that right? And especially if the use-case is reprocessing. In such a case, if some of these guarantees are relaxed, doesn't the feature boil down to just the ability to name checkpoints and restore from these checkpoints (and related book-keeping).

          Show
          sanjaypujare Sanjay M Pujare added a comment - If a user decides to use savepoints (named checkpoints), the platform cannot guarantee end-to-end exactly once or should not be expected to guarantee, isn't that right? And especially if the use-case is reprocessing. In such a case, if some of these guarantees are relaxed, doesn't the feature boil down to just the ability to name checkpoints and restore from these checkpoints (and related book-keeping).
          Hide
          davidyan David Yan added a comment -

          In addition to this, there are many components and operators in Malhar that assume you don't ever see a given window ID after it's been committed. Managed state is one of them, and it is the foundation of multiple operators. Having this feature will mean the assumption is wrong and things can go horribly wrong.

          Show
          davidyan David Yan added a comment - In addition to this, there are many components and operators in Malhar that assume you don't ever see a given window ID after it's been committed. Managed state is one of them, and it is the foundation of multiple operators. Having this feature will mean the assumption is wrong and things can go horribly wrong.

            People

            • Assignee:
              sandesh Sandesh
              Reporter:
              sandesh Sandesh
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development