Flume
  1. Flume
  2. FLUME-1516

FileChannel Write Dual Checkpoints to avoid replays

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: v1.3.0
    • Fix Version/s: v1.4.0
    • Component/s: Channel, File Channel
    • Labels:
      None

      Description

      Per the LFS paper (http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf) we can write two checkpoints to avoid replaying the logs in the case we crash/shutdown while writing a checkpoint.

      Section 4:

      "In order to handle a crash during a checkpoint operation there are actually two checkpoint regions, and checkpoint operations alternate between them. The checkpoint time is in the last block of the checkpoint so if the checkpoint fails the time will not be updated. During reboot, the system reads both checkpoint regions and uses the one with the most recent time."

      1. FLUME-1516-8.patch
        97 kB
        Hari Shreedharan
      2. FLUME-1516-7.patch
        92 kB
        Hari Shreedharan
      3. FLUME-1516-6.patch
        83 kB
        Hari Shreedharan
      4. FLUME-1516-5.patch
        83 kB
        Hari Shreedharan
      5. FLUME-1516-4.patch
        76 kB
        Hari Shreedharan
      6. FLUME-1516-3.patch
        76 kB
        Hari Shreedharan
      7. FLUME-1516-2.patch
        70 kB
        Hari Shreedharan
      8. DualCheckpointsv3.pdf
        79 kB
        Hari Shreedharan
      9. FLUME-1516-1.patch
        44 kB
        Hari Shreedharan
      10. FLUME-1516.patch
        44 kB
        Hari Shreedharan
      11. DualCheckpointsv2.pdf
        106 kB
        Hari Shreedharan
      12. DualCheckpoints.pdf
        96 kB
        Hari Shreedharan

        Issue Links

          Activity

          Hide
          Hari Shreedharan added a comment -

          We could make sure the writes always happen to a temporary file - like checkpoint.tmp or something, and then do a rename. So the replay logic does not need to change. Simply replay from the file named "checkpoint."

          Show
          Hari Shreedharan added a comment - We could make sure the writes always happen to a temporary file - like checkpoint.tmp or something, and then do a rename. So the replay logic does not need to change. Simply replay from the file named "checkpoint."
          Hide
          Brock Noland added a comment -

          Good call, that is much simpler.

          Show
          Brock Noland added a comment - Good call, that is much simpler.
          Hide
          Ted Malaska added a comment -

          Hmm I'm trying to follow.

          Hari is saying the writing process is as follows:
          1. Write event A to checkpoint.tmp
          2. Rename checkpoint to checkpoint.old
          3. Rename checkpoint.tmp to checkpoint
          4. Write event A to checkpoint.old
          5. Rename checkpoint.old to checkpoint.tmp

          Then the reading process would be as follows:
          1. Try to read from checkpoint
          2. If checkpoint is not there then try to read from checkpoint.tmp
          3. else read from checkpoint.old (this shouldn't happen)

          Let me know if this aligns with your thinking. If it does I will attempt to write the fix.

          Show
          Ted Malaska added a comment - Hmm I'm trying to follow. Hari is saying the writing process is as follows: 1. Write event A to checkpoint.tmp 2. Rename checkpoint to checkpoint.old 3. Rename checkpoint.tmp to checkpoint 4. Write event A to checkpoint.old 5. Rename checkpoint.old to checkpoint.tmp Then the reading process would be as follows: 1. Try to read from checkpoint 2. If checkpoint is not there then try to read from checkpoint.tmp 3. else read from checkpoint.old (this shouldn't happen) Let me know if this aligns with your thinking. If it does I will attempt to write the fix.
          Hide
          Brock Noland added a comment -

          Hi Ted,

          I think what we want to do is something like this:

          1) On startup copy checkpoint to checkpoint.tmp
          2) When checkpointing
          2.1) write data to checkpoint.tmp
          2.2) Rename checkpoint.tmp to checkpoint
          2.3) In background, copy checkpoint to checkpoint for the next checkpoint

          However, there are a ton of pending changes to this code in FLUME-1487 so any change might be best delayed until after we merge that patch. If we want to tackle it now we certainly could, just trying to save myself a little work!

          Show
          Brock Noland added a comment - Hi Ted, I think what we want to do is something like this: 1) On startup copy checkpoint to checkpoint.tmp 2) When checkpointing 2.1) write data to checkpoint.tmp 2.2) Rename checkpoint.tmp to checkpoint 2.3) In background, copy checkpoint to checkpoint for the next checkpoint However, there are a ton of pending changes to this code in FLUME-1487 so any change might be best delayed until after we merge that patch. If we want to tackle it now we certainly could, just trying to save myself a little work!
          Hide
          Hari Shreedharan added a comment -

          Checkpoint is written to checkpoint.tmp. Once the whole thing is written out, then rename checkpoint.tmp to checkpoint. So during startup, if the "checkpoint" file exists, we know it is a checkpoint which was completely written.

          Show
          Hari Shreedharan added a comment - Checkpoint is written to checkpoint.tmp. Once the whole thing is written out, then rename checkpoint.tmp to checkpoint. So during startup, if the "checkpoint" file exists, we know it is a checkpoint which was completely written.
          Hide
          Ted Malaska added a comment -

          Cool. I will wait for your call Brock.

          Thanks Hari. I will review the code with your comment in mind.

          Show
          Ted Malaska added a comment - Cool. I will wait for your call Brock. Thanks Hari. I will review the code with your comment in mind.
          Hide
          Hari Shreedharan added a comment - - edited

          Ted Malaska Are you still working on this one? I am planning to work on this soon. I have some ideas for this. I will post a concise design document for this in a while.

          Show
          Hari Shreedharan added a comment - - edited Ted Malaska Are you still working on this one? I am planning to work on this soon. I have some ideas for this. I will post a concise design document for this in a while.
          Hide
          Ted Malaska added a comment -

          Hey Hari,

          No I haven't started this one. And my schedule is heavy right now.

          Thanks for taking this up.

          Show
          Ted Malaska added a comment - Hey Hari, No I haven't started this one. And my schedule is heavy right now. Thanks for taking this up.
          Hide
          Hari Shreedharan added a comment -

          Thanks Ted.

          Show
          Hari Shreedharan added a comment - Thanks Ted.
          Hide
          Brock Noland added a comment -

          Hari,

          Great to see a patch on this! How is the design document coming along?

          Show
          Brock Noland added a comment - Hari, Great to see a patch on this! How is the design document coming along?
          Hide
          Hari Shreedharan added a comment -

          It seems I forgot to compile the tex file and attach it to the jira. Sorry about that.

          Show
          Hari Shreedharan added a comment - It seems I forgot to compile the tex file and attach it to the jira. Sorry about that.
          Hide
          Hari Shreedharan added a comment -

          Updated design doc to handle corner cases for deletion of data files. Also clarified both the algorithms a bit more (and used better fornt )

          Show
          Hari Shreedharan added a comment - Updated design doc to handle corner cases for deletion of data files. Also clarified both the algorithms a bit more (and used better fornt )
          Hide
          Brock Noland added a comment -

          Hari,

          I gave the design doc a quick review. I like design #2 (nice work!) and I am glad we are going with that one. I'll have more feedback soon.

          Show
          Brock Noland added a comment - Hari, I gave the design doc a quick review. I like design #2 (nice work!) and I am glad we are going with that one. I'll have more feedback soon.
          Hide
          Hari Shreedharan added a comment -

          The current design has a slight limitation:

          • If a checkpoint is corrupted after the checkpoint is completely written - that is the corruption did not occur because the channel was killed while a checkpoint was happening, then the log files would have checkpoint write order id of a newer checkpoint than the restored one. So we need to modify the log file meta format to backup the previous checkpoint information to prevent a full replay of the files even with the checkpoint being available.
          Show
          Hari Shreedharan added a comment - The current design has a slight limitation: If a checkpoint is corrupted after the checkpoint is completely written - that is the corruption did not occur because the channel was killed while a checkpoint was happening, then the log files would have checkpoint write order id of a newer checkpoint than the restored one. So we need to modify the log file meta format to backup the previous checkpoint information to prevent a full replay of the files even with the checkpoint being available.
          Hide
          Hari Shreedharan added a comment -

          Updated design document.

          Show
          Hari Shreedharan added a comment - Updated design document.
          Hide
          Hari Shreedharan added a comment -

          Patch reflecting the updated design.

          Show
          Hari Shreedharan added a comment - Patch reflecting the updated design.
          Hide
          Hari Shreedharan added a comment -

          Rebased on trunk

          Show
          Hari Shreedharan added a comment - Rebased on trunk
          Hide
          Hari Shreedharan added a comment -

          Further rebased.

          Show
          Hari Shreedharan added a comment - Further rebased.
          Hide
          Flume QA added a comment -

          Here are the results of testing the latest attachment
          https://issues.apache.org/jira/secure/attachment/12573799/FLUME-1516-5.patch against FLUME-1787.

          Overall: -1 due to an error

          ERROR: git merge failed

          Console output: https://builds.apache.org/job/PreCommit-FLUME-Build/23/console

          This message is automatically generated.

          Show
          Flume QA added a comment - Here are the results of testing the latest attachment https://issues.apache.org/jira/secure/attachment/12573799/FLUME-1516-5.patch against FLUME-1787 . Overall: -1 due to an error ERROR: git merge failed Console output: https://builds.apache.org/job/PreCommit-FLUME-Build/23/console This message is automatically generated.
          Hide
          Hari Shreedharan added a comment -

          Sorry about the noise, was trying out the precommit job.

          Show
          Hari Shreedharan added a comment - Sorry about the noise, was trying out the precommit job.
          Hide
          Flume QA added a comment -

          Here are the results of testing the latest attachment
          https://issues.apache.org/jira/secure/attachment/12575386/FLUME-1516-6.patch against trunk.

          Overall: -1 due to an error

          ERROR: failed to build with patch (exit code 1)

          Console output: https://builds.apache.org/job/PreCommit-FLUME-Build/24/console

          This message is automatically generated.

          Show
          Flume QA added a comment - Here are the results of testing the latest attachment https://issues.apache.org/jira/secure/attachment/12575386/FLUME-1516-6.patch against trunk. Overall: -1 due to an error ERROR: failed to build with patch (exit code 1) Console output: https://builds.apache.org/job/PreCommit-FLUME-Build/24/console This message is automatically generated.
          Hide
          Brock Noland added a comment -

          Job failed with:

          -----------------------------------------------------
          [INFO] ------------------------------------------------------------------------
          [ERROR] BUILD ERROR
          [INFO] ------------------------------------------------------------------------
          [INFO] Error getting reports from the plugin 'org.tomdz.maven:sphinx-maven-plugin:1.0.2': Unable to load the mojo 'org.tomdz.maven:sphinx-maven-plugin:1.0.2:generate' in the plugin 'org.tomdz.maven:sphinx-maven-plugin'. A required class is missing: org/codehaus/plexus/util/xml/XmlStreamWriter
          org.codehaus.plexus.util.xml.XmlStreamWriter
          [INFO] ------------------------------------------------------------------------
          [INFO] For more information, run Maven with the -e switch
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 39 seconds
          [INFO] Finished at: Mon Mar 25 21:23:56 UTC 2013
          [INFO] Final Memory: 74M/638M
          [INFO] ------------------------------------------------------------------------
          
          Show
          Brock Noland added a comment - Job failed with: ----------------------------------------------------- [INFO] ------------------------------------------------------------------------ [ERROR] BUILD ERROR [INFO] ------------------------------------------------------------------------ [INFO] Error getting reports from the plugin 'org.tomdz.maven:sphinx-maven-plugin:1.0.2': Unable to load the mojo 'org.tomdz.maven:sphinx-maven-plugin:1.0.2:generate' in the plugin 'org.tomdz.maven:sphinx-maven-plugin'. A required class is missing: org/codehaus/plexus/util/xml/XmlStreamWriter org.codehaus.plexus.util.xml.XmlStreamWriter [INFO] ------------------------------------------------------------------------ [INFO] For more information, run Maven with the -e switch [INFO] ------------------------------------------------------------------------ [INFO] Total time: 39 seconds [INFO] Finished at: Mon Mar 25 21:23:56 UTC 2013 [INFO] Final Memory: 74M/638M [INFO] ------------------------------------------------------------------------
          Hide
          Hari Shreedharan added a comment -

          Final patch from rb

          Show
          Hari Shreedharan added a comment - Final patch from rb
          Hide
          Brock Noland added a comment -

          Committed to trunk and 1.4! Nice work Hari! This was a big one and you did a great job following through!

          Show
          Brock Noland added a comment - Committed to trunk and 1.4! Nice work Hari! This was a big one and you did a great job following through!
          Hide
          Hudson added a comment -

          Integrated in flume-trunk #388 (See https://builds.apache.org/job/flume-trunk/388/)
          FLUME-1516: FileChannel Write Dual Checkpoints to avoid replays (Revision 6ca616800ec897551fbb14959ce3a5f0c1d69aed)

          Result = FAILURE
          brock : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=6ca616800ec897551fbb14959ce3a5f0c1d69aed
          Files :

          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventQueueBackingStore.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelConfiguration.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventQueueBackingStoreFile.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/ReplayHandler.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Log.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Serialization.java
          • flume-ng-channels/flume-file-channel/src/main/proto/filechannel.proto
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFileV3.java
          • flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelRestart.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFile.java
          • flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestUtils.java
          • flume-ng-doc/sphinx/FlumeUserGuide.rst
          • flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelBase.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/proto/ProtosFactory.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventQueueBackingStoreFactory.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventQueueBackingStoreFileV3.java
          Show
          Hudson added a comment - Integrated in flume-trunk #388 (See https://builds.apache.org/job/flume-trunk/388/ ) FLUME-1516 : FileChannel Write Dual Checkpoints to avoid replays (Revision 6ca616800ec897551fbb14959ce3a5f0c1d69aed) Result = FAILURE brock : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=6ca616800ec897551fbb14959ce3a5f0c1d69aed Files : flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventQueueBackingStore.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelConfiguration.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventQueueBackingStoreFile.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/ReplayHandler.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Log.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Serialization.java flume-ng-channels/flume-file-channel/src/main/proto/filechannel.proto flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFileV3.java flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelRestart.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFile.java flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestUtils.java flume-ng-doc/sphinx/FlumeUserGuide.rst flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelBase.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/proto/ProtosFactory.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventQueueBackingStoreFactory.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventQueueBackingStoreFileV3.java

            People

            • Assignee:
              Hari Shreedharan
              Reporter:
              Brock Noland
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development