Hadoop Common
  1. Hadoop Common
  2. HADOOP-3614

TestLeaseRecovery fails when run with assertions enabled.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.0
    • Fix Version/s: 0.18.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I used -ea jvm options to run the test, and it fails in FSDataset.findBlockFile(Block) on

      assert b.generationStamp == GenerationStamp.WILDCARD_STAMP;
      

      Without asserts on the test passes.

      1. 3614_20080926c.patch
        4 kB
        Tsz Wo Nicholas Sze
      2. 3614_20080926c_0.18.patch
        3 kB
        Tsz Wo Nicholas Sze
      3. 3614_20080926b.patch
        5 kB
        Tsz Wo Nicholas Sze
      4. 3614_20080926.patch
        5 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Konstantin Shvachko added a comment -

          I see the same assert with TestLeaseRecovery2.

          Show
          Konstantin Shvachko added a comment - I see the same assert with TestLeaseRecovery2.
          Hide
          Bill de hOra added a comment - - edited

          When I step through the tests I see that 1001L is the value of the generationStamp for newblocks[i] below (from TestLeaseRecovery.testBlockSynchronization)

                //update blocks with random block sizes
                Block[] newblocks = new Block[REPLICATION_NUM];
                for(int i = 0; i < REPLICATION_NUM; i++) {
                  newblocks[i] = new Block(lastblock.getBlockId(), newblocksizes[i],
                      lastblock.getGenerationStamp());
                  idps[i].updateBlock(lastblock, newblocks[i], false);
                  checkMetaInfo(newblocks[i], idps[i]);
                }
          

          when that gets to FSDataset.findBlockFile the assert line in evaluates to 1 != 1001L because GenerationStamp.WILDCARD_STAMP is 1. I'm assuming the 1001L value is being calculated as an offset from GenerationStamp.FIRST_VALID_STAMP (which is 1000L) and 1001L comes from using the GenerationStamp.nextStamp() method:

            public File findBlockFile(Block b) {
              assert b.getGenerationStamp() == GenerationStamp.WILDCARD_STAMP;
          
              File blockfile = null;
              ActiveFile activefile = ongoingCreates.get(b);
              if (activefile != null) {
                blockfile = activefile.file;
              }
              if (blockfile == null) {
                blockfile = getFile(b);
              }
              if (blockfile == null) {
                if (DataNode.LOG.isDebugEnabled()) {
                  DataNode.LOG.debug("ongoingCreates=" + ongoingCreates);
                  DataNode.LOG.debug("volumeMap=" + volumeMap);
                }
              }
              return blockfile;
            }
          

          That part of the code was last touched in r673857 as part of HADOOP-2885, where findBlockFile was made public. Trawling the commits it looks like the assert was added as part of HADOOP-3310 (r662513; I wish svn had bisect).

          What I don't understand is why the assert line is there; I don't see how it could ever work where stamps are actually being calculated. I suspect the assert needs to come out and you just might be the first person to run with assert on.

          Show
          Bill de hOra added a comment - - edited When I step through the tests I see that 1001L is the value of the generationStamp for newblocks [i] below (from TestLeaseRecovery.testBlockSynchronization) //update blocks with random block sizes Block[] newblocks = new Block[REPLICATION_NUM]; for ( int i = 0; i < REPLICATION_NUM; i++) { newblocks[i] = new Block(lastblock.getBlockId(), newblocksizes[i], lastblock.getGenerationStamp()); idps[i].updateBlock(lastblock, newblocks[i], false ); checkMetaInfo(newblocks[i], idps[i]); } when that gets to FSDataset.findBlockFile the assert line in evaluates to 1 != 1001L because GenerationStamp.WILDCARD_STAMP is 1. I'm assuming the 1001L value is being calculated as an offset from GenerationStamp.FIRST_VALID_STAMP (which is 1000L) and 1001L comes from using the GenerationStamp.nextStamp() method: public File findBlockFile(Block b) { assert b.getGenerationStamp() == GenerationStamp.WILDCARD_STAMP; File blockfile = null ; ActiveFile activefile = ongoingCreates.get(b); if (activefile != null ) { blockfile = activefile.file; } if (blockfile == null ) { blockfile = getFile(b); } if (blockfile == null ) { if (DataNode.LOG.isDebugEnabled()) { DataNode.LOG.debug( "ongoingCreates=" + ongoingCreates); DataNode.LOG.debug( "volumeMap=" + volumeMap); } } return blockfile; } That part of the code was last touched in r673857 as part of HADOOP-2885 , where findBlockFile was made public. Trawling the commits it looks like the assert was added as part of HADOOP-3310 (r662513; I wish svn had bisect). What I don't understand is why the assert line is there; I don't see how it could ever work where stamps are actually being calculated. I suspect the assert needs to come out and you just might be the first person to run with assert on.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Linking this to HADOOP-2658.

          Show
          Tsz Wo Nicholas Sze added a comment - Linking this to HADOOP-2658 .
          Hide
          Konstantin Shvachko added a comment -

          I think assert is right here. It means that findBlockFile() method should find a file containing the block by its block id no matter what its generation stamp is, and therefore findBlockFile should always be called with the generation stamp == WILDCARD_STAMP.
          What is incorrect here is the api. findBlockFile(long blockID) should take block ID as a parameter and then the assert can be removed.
          A comment stating exactly what the method does would be extremely helpful.

          Show
          Konstantin Shvachko added a comment - I think assert is right here. It means that findBlockFile() method should find a file containing the block by its block id no matter what its generation stamp is, and therefore findBlockFile should always be called with the generation stamp == WILDCARD_STAMP. What is incorrect here is the api. findBlockFile(long blockID) should take block ID as a parameter and then the assert can be removed. A comment stating exactly what the method does would be extremely helpful.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3614_20080926.patch: fixed the bug.

          Show
          Tsz Wo Nicholas Sze added a comment - 3614_20080926.patch: fixed the bug.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3614_20080926b.patch: fixed some comments and codes.

          Show
          Tsz Wo Nicholas Sze added a comment - 3614_20080926b.patch: fixed some comments and codes.
          Hide
          Konstantin Shvachko added a comment -
          1. In FSDataset.updateBlock() you do not need to set oldblock.setGenerationStamp(GenerationStamp.WILDCARD_STAMP)
          2. As far as I can see TestFileCtreation should be eventually moved to org.apache.hadoop.hdfs.server.datanode pavkage in the test directory. Because it uses mostly public (legally) classes and methods from other packages, and some methods like getBlockFile() that were made public just to keep the test in its current package. So I would rather leave findBlockFile() public for now and wouldn't introduce new DataNodeTestUtil class.
          3. Please add a JavaDoc comment for findBlockFile() so that there were no confusion about its semantics, which is rather hard to see right away.
          Show
          Konstantin Shvachko added a comment - In FSDataset.updateBlock() you do not need to set oldblock.setGenerationStamp(GenerationStamp.WILDCARD_STAMP) As far as I can see TestFileCtreation should be eventually moved to org.apache.hadoop.hdfs.server.datanode pavkage in the test directory. Because it uses mostly public (legally) classes and methods from other packages, and some methods like getBlockFile() that were made public just to keep the test in its current package. So I would rather leave findBlockFile() public for now and wouldn't introduce new DataNodeTestUtil class. Please add a JavaDoc comment for findBlockFile() so that there were no confusion about its semantics, which is rather hard to see right away.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3614_20080926c.patch: incorporated Konstantin's comments

          Show
          Tsz Wo Nicholas Sze added a comment - 3614_20080926c.patch: incorporated Konstantin's comments
          Hide
          Konstantin Shvachko added a comment -

          +1
          Looks like this is a bug and should be committed in all version starting from 0.18

          Show
          Konstantin Shvachko added a comment - +1 Looks like this is a bug and should be committed in all version starting from 0.18
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The patch fixed a bug which may cause other append-related tests failed.

          Show
          Tsz Wo Nicholas Sze added a comment - The patch fixed a bug which may cause other append-related tests failed.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Passed all tests locally. Submitting...

          Show
          Tsz Wo Nicholas Sze added a comment - Passed all tests locally. Submitting...
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12391069/3614_20080926c.patch
          against trunk revision 699517.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3387/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3387/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3387/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12391069/3614_20080926c.patch against trunk revision 699517. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3387/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3387/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3387/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3614_20080926c_0.18.patch: for 0.18

          Show
          Tsz Wo Nicholas Sze added a comment - 3614_20080926c_0.18.patch: for 0.18
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I just committed this to 0.18, 0.19 and trunk.

          Show
          Tsz Wo Nicholas Sze added a comment - I just committed this to 0.18, 0.19 and trunk.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #618 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/618/ )

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Konstantin Shvachko
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development