Hadoop Common
  1. Hadoop Common
  2. HADOOP-5314

needToSave incorrectly calculated in loadFSImage()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      FSImage.loadFSImage() incorrectly calculates the value of needToSave, which is always true and results in saving image at startup even if that is not necessary.

      1. HADOOP-5314.patch
        9 kB
        Jakob Homan
      2. HADOOP-5314.patch
        9 kB
        Jakob Homan
      3. HADOOP-5314.patch
        10 kB
        Jakob Homan

        Issue Links

          Activity

          Hide
          Konstantin Shvachko added a comment -

          I just committed this. Thank you Jakob.

          Show
          Konstantin Shvachko added a comment - I just committed this. Thank you Jakob.
          Hide
          Jakob Homan added a comment -

          Indeed I did. Touché. Here's the real correct one.

          Show
          Jakob Homan added a comment - Indeed I did. Touché. Here's the real correct one.
          Hide
          Konstantin Shvachko added a comment -

          You've uploaded the same file.

          Show
          Konstantin Shvachko added a comment - You've uploaded the same file.
          Hide
          Jakob Homan added a comment -

          Regenerated patch AND granted license....

          Show
          Jakob Homan added a comment - Regenerated patch AND granted license....
          Hide
          Jakob Homan added a comment -

          Regenerated patch to apply against trunk.

          Show
          Jakob Homan added a comment - Regenerated patch to apply against trunk.
          Hide
          Konstantin Shvachko added a comment -

          Need to regenerate the patch to reflect latest commits.

          Show
          Konstantin Shvachko added a comment - Need to regenerate the patch to reflect latest commits.
          Hide
          Jakob Homan added a comment -

          The failed contrib test, TestCapacityScheduler, runs fine on my machines.

              [junit] Running org.apache.hadoop.mapred.TestCapacityScheduler
              [junit] Tests run: 23, Failures: 0, Errors: 0, Time elapsed: 3.404 sec
              [junit] Running org.apache.hadoop.mapred.TestCapacitySchedulerConf
              [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.452 sec
              [junit] Running org.apache.hadoop.mapred.TestJobInitialization
              [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 13.698 sec
              [junit] Running org.apache.hadoop.mapred.TestQueueCapacities
              [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 476.918 sec
          
          Show
          Jakob Homan added a comment - The failed contrib test, TestCapacityScheduler, runs fine on my machines. [junit] Running org.apache.hadoop.mapred.TestCapacityScheduler [junit] Tests run: 23, Failures: 0, Errors: 0, Time elapsed: 3.404 sec [junit] Running org.apache.hadoop.mapred.TestCapacitySchedulerConf [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.452 sec [junit] Running org.apache.hadoop.mapred.TestJobInitialization [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 13.698 sec [junit] Running org.apache.hadoop.mapred.TestQueueCapacities [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 476.918 sec
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12407641/HADOOP-5314.patch
          against trunk revision 772960.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407641/HADOOP-5314.patch against trunk revision 772960. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/console This message is automatically generated.
          Hide
          Jakob Homan added a comment -

          Thanks for the quick review. Actually, it wasn't sorted before and is now. Eclipse auto-sorted when I imported the set object.

          Show
          Jakob Homan added a comment - Thanks for the quick review. Actually, it wasn't sorted before and is now. Eclipse auto-sorted when I imported the set object.
          Hide
          Konstantin Shvachko added a comment -

          The "the overly complicated statement" was intended to avoid using extra objects like HashSet. But your approach works too.
          Thanks for noting that editLog should be opened explicitly if the image was not saved.
          I do not see any rational in changing the order of the imports. They were lexicographically ordered. Could you please restore the original order.
          +1 other than that.

          Show
          Konstantin Shvachko added a comment - The "the overly complicated statement" was intended to avoid using extra objects like HashSet. But your approach works too. Thanks for noting that editLog should be opened explicitly if the image was not saved. I do not see any rational in changing the order of the imports. They were lexicographically ordered. Could you please restore the original order. +1 other than that.
          Hide
          Jakob Homan added a comment -

          submitting patch

          Show
          Jakob Homan added a comment - submitting patch
          Hide
          Jakob Homan added a comment -

          I'm afraid that didn't work - at the point loadfsimage is called, the files already exist. I replaced the overly complicated statement that was trying to determine if there were more than one checkpoint times with a set of checkpoint times. It's more elegant - after processing all the storage directories, check the cardinality of the set, if it's greater than one, there were more than one checkpoint times recorded and needToSave should be set to true.

          Fixing this bug revealed another bug where the editLog was only being opened when the fsimage was being saved. To fix this, I added a call to open the editlog to the end of loadfsimage.

          Also added some minor readability improvements to the relevant sections of code.

          Passes all unit tests.

               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          

          I can't figure out a reasonable way to unit test these fixes, though I've manually tested them several times.

          Show
          Jakob Homan added a comment - I'm afraid that didn't work - at the point loadfsimage is called, the files already exist. I replaced the overly complicated statement that was trying to determine if there were more than one checkpoint times with a set of checkpoint times. It's more elegant - after processing all the storage directories, check the cardinality of the set, if it's greater than one, there were more than one checkpoint times recorded and needToSave should be set to true. Fixing this bug revealed another bug where the editLog was only being opened when the fsimage was being saved. To fix this, I added a call to open the editlog to the end of loadfsimage. Also added some minor readability improvements to the relevant sections of code. Passes all unit tests. [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. I can't figure out a reasonable way to unit test these fixes, though I've manually tested them several times.
          Hide
          Konstantin Shvachko added a comment -

          I think this should solve the problem:

          checkpointTime = readCheckpointTime(sd);
          needToSave |= (!editsExists);
          needToSave |= (!imageExists);
          if (sd.getStorageDirType().isOfType(NameNodeDirType.IMAGE) && 
             (latestNameCheckpointTime < checkpointTime) && imageExists) {
            // Force saving of new image if checkpoint time
            // is not same in all of the storage directories.
            if ((latestNameCheckpointTime != Long.MIN_VALUE) && 
                (checkpointTime != latestNameCheckpointTime)) {
              needToSave |= true;
            }
            latestNameCheckpointTime = checkpointTime;
            latestNameSD = sd;
          }
          if (sd.getStorageDirType().isOfType(NameNodeDirType.EDITS) && 
               (latestEditsCheckpointTime < checkpointTime) && editsExists) {
            // Force saving of new image if checkpoint time
            // is not same in all of the storage directories.
            if ((latestEditsCheckpointTime != Long.MIN_VALUE) && 
                (checkpointTime != latestNameCheckpointTime)) {
              needToSave |= true;
            }
            latestEditsCheckpointTime = checkpointTime;
            latestEditsSD = sd;
          }
          
          Show
          Konstantin Shvachko added a comment - I think this should solve the problem: checkpointTime = readCheckpointTime(sd); needToSave |= (!editsExists); needToSave |= (!imageExists); if (sd.getStorageDirType().isOfType(NameNodeDirType.IMAGE) && (latestNameCheckpointTime < checkpointTime) && imageExists) { // Force saving of new image if checkpoint time // is not same in all of the storage directories. if ((latestNameCheckpointTime != Long .MIN_VALUE) && (checkpointTime != latestNameCheckpointTime)) { needToSave |= true ; } latestNameCheckpointTime = checkpointTime; latestNameSD = sd; } if (sd.getStorageDirType().isOfType(NameNodeDirType.EDITS) && (latestEditsCheckpointTime < checkpointTime) && editsExists) { // Force saving of new image if checkpoint time // is not same in all of the storage directories. if ((latestEditsCheckpointTime != Long .MIN_VALUE) && (checkpointTime != latestNameCheckpointTime)) { needToSave |= true ; } latestEditsCheckpointTime = checkpointTime; latestEditsSD = sd; }
          Hide
          Konstantin Shvachko added a comment -

          The problem is that on the very first iteration of the loop needToSave is set to true and cannot be reset to false after that. It happens because checkpointTime never equals Long.MIN_VALUE while both latestNameCheckpointTime and latestEditsCheckpointTime do.

          Show
          Konstantin Shvachko added a comment - The problem is that on the very first iteration of the loop needToSave is set to true and cannot be reset to false after that. It happens because checkpointTime never equals Long.MIN_VALUE while both latestNameCheckpointTime and latestEditsCheckpointTime do.

            People

            • Assignee:
              Jakob Homan
              Reporter:
              Konstantin Shvachko
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development