Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.3.1
    • Fix Version/s: 3.5.2, 3.6.0
    • Component/s: None
    • Labels:
      None

      Description

      PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger.

      1. ZOOKEEPER-872
        3 kB
        Vishal Kher
      2. ZOOKEEPER-872.patch
        3 kB
        Mahadev konar

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        279d 12h 44m 3 Mahadev konar 07/Jul/11 08:39
        Patch Available Patch Available Open Open
        190d 6h 40m 3 Patrick Hunt 28/Dec/11 21:16
        Michi Mutsuzaki made changes -
        Fix Version/s 3.5.2 [ 12331981 ]
        Fix Version/s 3.6.0 [ 12326518 ]
        Fix Version/s 3.5.1 [ 12326786 ]
        Patrick Hunt made changes -
        Fix Version/s 3.5.1 [ 12326786 ]
        Fix Version/s 3.5.0 [ 12316644 ]
        Patrick Hunt made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Patrick Hunt added a comment -

        The updated patch seems to be the same as the original path. Mahadev are you sure you attached the correct one? We need to address the comments before committing this.

        Show
        Patrick Hunt added a comment - The updated patch seems to be the same as the original path. Mahadev are you sure you attached the correct one? We need to address the comments before committing this.
        Hide
        Patrick Hunt added a comment -

        Can you refresh the patch? not applying to trunk.

        Show
        Patrick Hunt added a comment - Can you refresh the patch? not applying to trunk.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12485541/ZOOKEEPER-872.patch
        against trunk revision 1214571.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The patch appears to cause tar ant target to fail.

        -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/829//testReport/
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/829//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485541/ZOOKEEPER-872.patch against trunk revision 1214571. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/829//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/829//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12485541/ZOOKEEPER-872.patch
        against trunk revision 1172406.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The patch appears to cause tar ant target to fail.

        -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/575//testReport/
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/575//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485541/ZOOKEEPER-872.patch against trunk revision 1172406. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/575//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/575//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12485541/ZOOKEEPER-872.patch
        against trunk revision 1143688.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/374//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/374//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/374//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485541/ZOOKEEPER-872.patch against trunk revision 1143688. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/374//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/374//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/374//console This message is automatically generated.
        Mahadev konar made changes -
        Fix Version/s 3.5.0 [ 12316644 ]
        Fix Version/s 3.4.0 [ 12314469 ]
        Hide
        Mahadev konar added a comment -

        Moving this out to 3.5 since this isnt critical. Vishal, in case you need this for 3.4, can you please upload a patch by friday?

        Show
        Mahadev konar added a comment - Moving this out to 3.5 since this isnt critical. Vishal, in case you need this for 3.4, can you please upload a patch by friday?
        Mahadev konar made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Mahadev konar made changes -
        Attachment ZOOKEEPER-872.patch [ 12485541 ]
        Hide
        Mahadev konar added a comment -

        Re uploading Vishal's patch.

        Show
        Mahadev konar added a comment - Re uploading Vishal's patch.
        Mahadev konar made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Mahadev konar added a comment -

        cancelling the patch. the patch is not relative to top level dir.

        Show
        Mahadev konar added a comment - cancelling the patch. the patch is not relative to top level dir.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12454621/ZOOKEEPER-872
        against trunk revision 1138100.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/334//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12454621/ZOOKEEPER-872 against trunk revision 1138100. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/334//console This message is automatically generated.
        Benjamin Reed made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Vishal Kher added a comment -

        Hi Pat,

        The current code prints to stdout. We have a RMI service that has ZK server embedded in it. We do this so that we can run/start/stop ZK across platforms without having to write platform specific scripts. In this server, we start a thread that periodically calls PurgeTxnlog.purge(). As you pointed out, we should have a -q flag to direct to log instead stdout to statisfy both the approaches. I will make that change.

        We chose number 2 here because we think having only one backup will be enough. It is not clear to us under what conditions the additional backup will be useful.

        Backups are useful under the following scenario (correct me if I am wrong):
        1. The current ZooKeeper transaction log and/or snapshot is corrupted, but the past snapshots and transaction logs are ok. Corrupting can mean either disk file corruption or corrupting of transaction entries in the log. We store ZooKeeper data on mirrored disks.
        2. The application itself made some errors that requires reverting back to the older version.

        For the first point, having one additional backup would suffice. The second point is really tricky. I am not sure how the application can decide which snapshot to revert to. I think in most cases it will be trial and error. It is not clear to me how to estimate the number of backups needed. Also, it is not clear how one would go about going back in time. I looked at LogFormatter utility and that utility does not help much in undoing the erroneous transactions for case 2 above. In general, I think it is good to enforce users to have a minimum of one backup.

        Related question: Is there hash on the log files (or internal tree structures) that can tell the ZooKeeper server if the logs are corrupted. If yes, the zookeeper server can verify the hash during startup and take some action based on that. For example, make sure that it never becomes a leader until it gets the correct snapshot from the existing leader (otherwise it may endup corrupting other server's log). "Corrupting" here refers to the case where the file is readable, but one or more transactions in the log are bad.

        I am not sure if there is a test for this. If I remember correctly, there is a bug that causes the purge() function to leave behind one addition log file. Please refer to my question above about findNRecentSnapshots(). I can add a test or modify the pruge utlity once we have concluded this discussion.

        Show
        Vishal Kher added a comment - Hi Pat, The current code prints to stdout. We have a RMI service that has ZK server embedded in it. We do this so that we can run/start/stop ZK across platforms without having to write platform specific scripts. In this server, we start a thread that periodically calls PurgeTxnlog.purge(). As you pointed out, we should have a -q flag to direct to log instead stdout to statisfy both the approaches. I will make that change. We chose number 2 here because we think having only one backup will be enough. It is not clear to us under what conditions the additional backup will be useful. Backups are useful under the following scenario (correct me if I am wrong): 1. The current ZooKeeper transaction log and/or snapshot is corrupted, but the past snapshots and transaction logs are ok. Corrupting can mean either disk file corruption or corrupting of transaction entries in the log. We store ZooKeeper data on mirrored disks. 2. The application itself made some errors that requires reverting back to the older version. For the first point, having one additional backup would suffice. The second point is really tricky. I am not sure how the application can decide which snapshot to revert to. I think in most cases it will be trial and error. It is not clear to me how to estimate the number of backups needed. Also, it is not clear how one would go about going back in time. I looked at LogFormatter utility and that utility does not help much in undoing the erroneous transactions for case 2 above. In general, I think it is good to enforce users to have a minimum of one backup. Related question: Is there hash on the log files (or internal tree structures) that can tell the ZooKeeper server if the logs are corrupted. If yes, the zookeeper server can verify the hash during startup and take some action based on that. For example, make sure that it never becomes a leader until it gets the correct snapshot from the existing leader (otherwise it may endup corrupting other server's log). "Corrupting" here refers to the case where the file is readable, but one or more transactions in the log are bad. I am not sure if there is a test for this. If I remember correctly, there is a bug that causes the purge() function to leave behind one addition log file. Please refer to my question above about findNRecentSnapshots(). I can add a test or modify the pruge utlity once we have concluded this discussion.
        Patrick Hunt made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Patrick Hunt added a comment -

        Hi Vishal, I noticed a couple issues.

        This class is a command line utility. As such we are outputting to both the command line and to the log. The usage() in particular should go to std out so that the user will see it regardless of the log settings (fine if you want to output it to LOG as well, but I think this is unnecessary).

        good catch on the error handling for this:

        public static void purge(File dataDir, File snapDir, int num) throws IOException {

        • if (num < 3) {
        • throw new IllegalArgumentException("count should be greater than 3");
          + if (num < 2) { + throw new IllegalArgumentException("count should be greater than 1"); }

        However the number 3 was chosen to ensure that ppl don't shoot themselves in the foot (if the most recent logs get corrupted we'll fall back to the prior when attempting to recover). There really should be a comment to this effect (would be good to add). I don't know how Mahadev feels on this setting (min 3 vs some other number) but he might have more insight as IIRC he implemented this originally.

        this following is there to provide feedback to the user when running on command line:

        • System.out.println("Removing file: "+
        • DateFormat.getDateTimeInstance().format(f.lastModified())+
        • "\t"+f.getPath());

        again, regardless of logging setup.

        Perhaps we should have a "-q" option that turns off the CLI logging and just logs to the log file? I know this has been an issue previously (stdout/err) given that cron will spitout emails by default containing stdout/err.

        Also, is there a test for this? If you're up to it would be great to add.

        Show
        Patrick Hunt added a comment - Hi Vishal, I noticed a couple issues. This class is a command line utility. As such we are outputting to both the command line and to the log. The usage() in particular should go to std out so that the user will see it regardless of the log settings (fine if you want to output it to LOG as well, but I think this is unnecessary). good catch on the error handling for this: public static void purge(File dataDir, File snapDir, int num) throws IOException { if (num < 3) { throw new IllegalArgumentException("count should be greater than 3"); + if (num < 2) { + throw new IllegalArgumentException("count should be greater than 1"); } However the number 3 was chosen to ensure that ppl don't shoot themselves in the foot (if the most recent logs get corrupted we'll fall back to the prior when attempting to recover). There really should be a comment to this effect (would be good to add). I don't know how Mahadev feels on this setting (min 3 vs some other number) but he might have more insight as IIRC he implemented this originally. this following is there to provide feedback to the user when running on command line: System.out.println("Removing file: "+ DateFormat.getDateTimeInstance().format(f.lastModified())+ "\t"+f.getPath()); again, regardless of logging setup. Perhaps we should have a "-q" option that turns off the CLI logging and just logs to the log file? I know this has been an issue previously (stdout/err) given that cron will spitout emails by default containing stdout/err. Also, is there a test for this? If you're up to it would be great to add.
        Mahadev konar made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Mahadev konar made changes -
        Assignee Vishal K [ vishalmlst ]
        Fix Version/s 3.4.0 [ 12314469 ]
        Hide
        Vishal Kher added a comment -

        Is there any reason why we dont have a findNRecentLogs(int n) method to return the n most recent logs (similar to findNRecentSnapshots)?
        While testing I noticed that it can happen that a log file is left undeleted depending on the transaction id of the nth snapshot file. Thus, we will have n snapshots, but n+1 log file left behind instead of n. This file will be deleted after the next snapshot is taken.

        We won't have this problem if we have a function that keeps the n most recent logs and removes the rest.

        Show
        Vishal Kher added a comment - Is there any reason why we dont have a findNRecentLogs(int n) method to return the n most recent logs (similar to findNRecentSnapshots)? While testing I noticed that it can happen that a log file is left undeleted depending on the transaction id of the nth snapshot file. Thus, we will have n snapshots, but n+1 log file left behind instead of n. This file will be deleted after the next snapshot is taken. We won't have this problem if we have a function that keeps the n most recent logs and removes the rest.
        Vishal Kher made changes -
        Field Original Value New Value
        Attachment ZOOKEEPER-872 [ 12454621 ]
        Hide
        Vishal Kher added a comment -

        patch attached.

        Show
        Vishal Kher added a comment - patch attached.
        Vishal Kher created issue -

          People

          • Assignee:
            Vishal Kher
            Reporter:
            Vishal Kher
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development