Hadoop Common
  1. Hadoop Common
  2. HADOOP-5314

needToSave incorrectly calculated in loadFSImage()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      FSImage.loadFSImage() incorrectly calculates the value of needToSave, which is always true and results in saving image at startup even if that is not necessary.

      1. HADOOP-5314.patch
        10 kB
        Jakob Homan
      2. HADOOP-5314.patch
        9 kB
        Jakob Homan
      3. HADOOP-5314.patch
        9 kB
        Jakob Homan

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          73d 18h 15m 1 Jakob Homan 08/May/09 18:05
          Patch Available Patch Available Resolved Resolved
          7d 5h 5m 1 Konstantin Shvachko 15/May/09 23:10
          Resolved Resolved Closed Closed
          465d 22h 25m 1 Tom White 24/Aug/10 21:36
          Todd Lipcon made changes -
          Link This issue is related to HDFS-1981 [ HDFS-1981 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]
          Konstantin Shvachko made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.21.0 [ 12313563 ]
          Fix Version/s 0.19.2 [ 12313650 ]
          Resolution Fixed [ 1 ]
          Hide
          Konstantin Shvachko added a comment -

          I just committed this. Thank you Jakob.

          Show
          Konstantin Shvachko added a comment - I just committed this. Thank you Jakob.
          Jakob Homan made changes -
          Attachment HADOOP-5314.patch [ 12408270 ]
          Hide
          Jakob Homan added a comment -

          Indeed I did. Touché. Here's the real correct one.

          Show
          Jakob Homan added a comment - Indeed I did. Touché. Here's the real correct one.
          Hide
          Konstantin Shvachko added a comment -

          You've uploaded the same file.

          Show
          Konstantin Shvachko added a comment - You've uploaded the same file.
          Jakob Homan made changes -
          Attachment HADOOP-5314.patch [ 12408210 ]
          Hide
          Jakob Homan added a comment -

          Regenerated patch AND granted license....

          Show
          Jakob Homan added a comment - Regenerated patch AND granted license....
          Jakob Homan made changes -
          Attachment HADOOP-5314.patch [ 12408209 ]
          Jakob Homan made changes -
          Attachment HADOOP-5314.patch [ 12408209 ]
          Hide
          Jakob Homan added a comment -

          Regenerated patch to apply against trunk.

          Show
          Jakob Homan added a comment - Regenerated patch to apply against trunk.
          Hide
          Konstantin Shvachko added a comment -

          Need to regenerate the patch to reflect latest commits.

          Show
          Konstantin Shvachko added a comment - Need to regenerate the patch to reflect latest commits.
          Hide
          Jakob Homan added a comment -

          The failed contrib test, TestCapacityScheduler, runs fine on my machines.

              [junit] Running org.apache.hadoop.mapred.TestCapacityScheduler
              [junit] Tests run: 23, Failures: 0, Errors: 0, Time elapsed: 3.404 sec
              [junit] Running org.apache.hadoop.mapred.TestCapacitySchedulerConf
              [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.452 sec
              [junit] Running org.apache.hadoop.mapred.TestJobInitialization
              [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 13.698 sec
              [junit] Running org.apache.hadoop.mapred.TestQueueCapacities
              [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 476.918 sec
          
          Show
          Jakob Homan added a comment - The failed contrib test, TestCapacityScheduler, runs fine on my machines. [junit] Running org.apache.hadoop.mapred.TestCapacityScheduler [junit] Tests run: 23, Failures: 0, Errors: 0, Time elapsed: 3.404 sec [junit] Running org.apache.hadoop.mapred.TestCapacitySchedulerConf [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.452 sec [junit] Running org.apache.hadoop.mapred.TestJobInitialization [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 13.698 sec [junit] Running org.apache.hadoop.mapred.TestQueueCapacities [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 476.918 sec
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12407641/HADOOP-5314.patch
          against trunk revision 772960.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407641/HADOOP-5314.patch against trunk revision 772960. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/316/console This message is automatically generated.
          Hide
          Jakob Homan added a comment -

          Thanks for the quick review. Actually, it wasn't sorted before and is now. Eclipse auto-sorted when I imported the set object.

          Show
          Jakob Homan added a comment - Thanks for the quick review. Actually, it wasn't sorted before and is now. Eclipse auto-sorted when I imported the set object.
          Hide
          Konstantin Shvachko added a comment -

          The "the overly complicated statement" was intended to avoid using extra objects like HashSet. But your approach works too.
          Thanks for noting that editLog should be opened explicitly if the image was not saved.
          I do not see any rational in changing the order of the imports. They were lexicographically ordered. Could you please restore the original order.
          +1 other than that.

          Show
          Konstantin Shvachko added a comment - The "the overly complicated statement" was intended to avoid using extra objects like HashSet. But your approach works too. Thanks for noting that editLog should be opened explicitly if the image was not saved. I do not see any rational in changing the order of the imports. They were lexicographically ordered. Could you please restore the original order. +1 other than that.
          Jakob Homan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Jakob Homan added a comment -

          submitting patch

          Show
          Jakob Homan added a comment - submitting patch
          Jakob Homan made changes -
          Attachment HADOOP-5314.patch [ 12407641 ]
          Hide
          Jakob Homan added a comment -

          I'm afraid that didn't work - at the point loadfsimage is called, the files already exist. I replaced the overly complicated statement that was trying to determine if there were more than one checkpoint times with a set of checkpoint times. It's more elegant - after processing all the storage directories, check the cardinality of the set, if it's greater than one, there were more than one checkpoint times recorded and needToSave should be set to true.

          Fixing this bug revealed another bug where the editLog was only being opened when the fsimage was being saved. To fix this, I added a call to open the editlog to the end of loadfsimage.

          Also added some minor readability improvements to the relevant sections of code.

          Passes all unit tests.

               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          

          I can't figure out a reasonable way to unit test these fixes, though I've manually tested them several times.

          Show
          Jakob Homan added a comment - I'm afraid that didn't work - at the point loadfsimage is called, the files already exist. I replaced the overly complicated statement that was trying to determine if there were more than one checkpoint times with a set of checkpoint times. It's more elegant - after processing all the storage directories, check the cardinality of the set, if it's greater than one, there were more than one checkpoint times recorded and needToSave should be set to true. Fixing this bug revealed another bug where the editLog was only being opened when the fsimage was being saved. To fix this, I added a call to open the editlog to the end of loadfsimage. Also added some minor readability improvements to the relevant sections of code. Passes all unit tests. [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. I can't figure out a reasonable way to unit test these fixes, though I've manually tested them several times.
          Hide
          Konstantin Shvachko added a comment -

          I think this should solve the problem:

          checkpointTime = readCheckpointTime(sd);
          needToSave |= (!editsExists);
          needToSave |= (!imageExists);
          if (sd.getStorageDirType().isOfType(NameNodeDirType.IMAGE) && 
             (latestNameCheckpointTime < checkpointTime) && imageExists) {
            // Force saving of new image if checkpoint time
            // is not same in all of the storage directories.
            if ((latestNameCheckpointTime != Long.MIN_VALUE) && 
                (checkpointTime != latestNameCheckpointTime)) {
              needToSave |= true;
            }
            latestNameCheckpointTime = checkpointTime;
            latestNameSD = sd;
          }
          if (sd.getStorageDirType().isOfType(NameNodeDirType.EDITS) && 
               (latestEditsCheckpointTime < checkpointTime) && editsExists) {
            // Force saving of new image if checkpoint time
            // is not same in all of the storage directories.
            if ((latestEditsCheckpointTime != Long.MIN_VALUE) && 
                (checkpointTime != latestNameCheckpointTime)) {
              needToSave |= true;
            }
            latestEditsCheckpointTime = checkpointTime;
            latestEditsSD = sd;
          }
          
          Show
          Konstantin Shvachko added a comment - I think this should solve the problem: checkpointTime = readCheckpointTime(sd); needToSave |= (!editsExists); needToSave |= (!imageExists); if (sd.getStorageDirType().isOfType(NameNodeDirType.IMAGE) && (latestNameCheckpointTime < checkpointTime) && imageExists) { // Force saving of new image if checkpoint time // is not same in all of the storage directories. if ((latestNameCheckpointTime != Long .MIN_VALUE) && (checkpointTime != latestNameCheckpointTime)) { needToSave |= true ; } latestNameCheckpointTime = checkpointTime; latestNameSD = sd; } if (sd.getStorageDirType().isOfType(NameNodeDirType.EDITS) && (latestEditsCheckpointTime < checkpointTime) && editsExists) { // Force saving of new image if checkpoint time // is not same in all of the storage directories. if ((latestEditsCheckpointTime != Long .MIN_VALUE) && (checkpointTime != latestNameCheckpointTime)) { needToSave |= true ; } latestEditsCheckpointTime = checkpointTime; latestEditsSD = sd; }
          Jakob Homan made changes -
          Assignee Jakob Homan [ jghoman ]
          Konstantin Shvachko made changes -
          Description {{FSImage.loadFSImage()}} incorrectly calculates the value of {{needToSave }}, which is always true and results in saving image at startup even if that is not necessary. {{FSImage.loadFSImage()}} incorrectly calculates the value of {{needToSave}}, which is always true and results in saving image at startup even if that is not necessary.
          Konstantin Shvachko made changes -
          Field Original Value New Value
          Component/s dfs [ 12310710 ]
          Hide
          Konstantin Shvachko added a comment -

          The problem is that on the very first iteration of the loop needToSave is set to true and cannot be reset to false after that. It happens because checkpointTime never equals Long.MIN_VALUE while both latestNameCheckpointTime and latestEditsCheckpointTime do.

          Show
          Konstantin Shvachko added a comment - The problem is that on the very first iteration of the loop needToSave is set to true and cannot be reset to false after that. It happens because checkpointTime never equals Long.MIN_VALUE while both latestNameCheckpointTime and latestEditsCheckpointTime do.
          Konstantin Shvachko created issue -

            People

            • Assignee:
              Jakob Homan
              Reporter:
              Konstantin Shvachko
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development