Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed

      Description

      Currently the key to Block is block id + generation stamp. I would propose to change it to be only block id. This is based on the following properties of the dfs cluster:
      1. On each datanode only one replica of block exists. Therefore there is only one generation of a block.
      2. NameNode has only one entry for a block in its blocks map.

      With this change, search for a block/replica's meta information is easier since most of the time we know a block's id but may not know its generation stamp.

      1. blockIdAsKey.patch
        23 kB
        Konstantin Shvachko
      2. blockIdAsKey.patch
        20 kB
        Konstantin Shvachko
      3. blockKey.patch
        25 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          Hairong Kuang added a comment -

          This patch change Block's key from block id+generation stamp to be block id. It also removes the class GenerationStamp and the use of generation wild card.

          Show
          Hairong Kuang added a comment - This patch change Block's key from block id+generation stamp to be block id. It also removes the class GenerationStamp and the use of generation wild card.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Changing the key of the maps from block id and generation stamp to only block id sounds a good idea. However, changing the Block.equal(..) method may not be correct. Conceptually, block A is not equal to block B, given that they have the same id but distinct generation stamps.

          Show
          Tsz Wo Nicholas Sze added a comment - Changing the key of the maps from block id and generation stamp to only block id sounds a good idea. However, changing the Block.equal(..) method may not be correct. Conceptually, block A is not equal to block B, given that they have the same id but distinct generation stamps.
          Hide
          Hairong Kuang added a comment -

          The way I look at this problem is that I'd like to treat a generation stamp as a property of a block just as the property block length. The only key to identify a block is the block id.

          Show
          Hairong Kuang added a comment - The way I look at this problem is that I'd like to treat a generation stamp as a property of a block just as the property block length. The only key to identify a block is the block id.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > The way I look at this problem is that I'd like to treat a generation stamp as a property of a block just as the property block length. The only key to identify a block is the block id.

          If this is a new class, you are right that we can view it that way. However, Block is an existing base class and it is there for a very long time. We cannot simply view it in one way yesterday and view it in another way today. Also, it belongs to the org.apache.hadoop.hdfs.protocol package. This may potentially break some external applications.

          Inside hdfs, there are

          • HashMap<Block, BlockScanInfo> blockMap in DataBlockScanner
          • Map<Block, BalancerBlock> globalBlockList in Balancer
          • List<HashMap<Block, BalancerBlock>> movedBlocks in Balancer.MovedBlocks
          • Map<Block, PendingBlockInfo> pendingReplications in PendingReplicationBlocks
          • List<TreeSet<Block>> priorityQueues in UnderReplicatedBlocks
          • BlockInfo extends Block
          • BlockMetaDataInfo extends Block
          • ...

          Have you considered the implication to them?

          Show
          Tsz Wo Nicholas Sze added a comment - > The way I look at this problem is that I'd like to treat a generation stamp as a property of a block just as the property block length. The only key to identify a block is the block id. If this is a new class, you are right that we can view it that way. However, Block is an existing base class and it is there for a very long time. We cannot simply view it in one way yesterday and view it in another way today. Also, it belongs to the org.apache.hadoop.hdfs.protocol package. This may potentially break some external applications. Inside hdfs, there are HashMap<Block, BlockScanInfo> blockMap in DataBlockScanner Map<Block, BalancerBlock> globalBlockList in Balancer List<HashMap<Block, BalancerBlock>> movedBlocks in Balancer.MovedBlocks Map<Block, PendingBlockInfo> pendingReplications in PendingReplicationBlocks List<TreeSet<Block>> priorityQueues in UnderReplicatedBlocks BlockInfo extends Block BlockMetaDataInfo extends Block ... Have you considered the implication to them?
          Hide
          Hairong Kuang added a comment -

          > Block is an existing base class and it is there for a very long time. We cannot simply view it in one way yesterday and view it in another way today.
          I agree that Block is a long existing base class. But I disagree that we could not do any change to it. Adding generation stamp as a part of its key was introduced by the append project. As I think more on the new append design, I feel that this was a design flaw and has caused many problems that we could not handle multiple generation stamps. That's why I want to make the proposed change.

          As I said in the description of this jira, this change is based on the following facts: (1) On each datanode only one replica of block exists. Therefore there is only one generation of a block. (2) NameNode has only one entry for a block in its blocks map.

          So in all the maps that you mentioned, there should be only one entry of a block per block id. In either NN or DN, there is only one entry of blockInfo or replicaInfo per block. I do not think changing the key of the Block should cause any problem to all these data structures in dfs. If there are two generations of a block in those data structures, this is an error. Whether the key contains generation stamp or not, dfs should handle this error.

          I agree this may break some external applications. We could put this change in the release note.

          Show
          Hairong Kuang added a comment - > Block is an existing base class and it is there for a very long time. We cannot simply view it in one way yesterday and view it in another way today. I agree that Block is a long existing base class. But I disagree that we could not do any change to it. Adding generation stamp as a part of its key was introduced by the append project. As I think more on the new append design, I feel that this was a design flaw and has caused many problems that we could not handle multiple generation stamps. That's why I want to make the proposed change. As I said in the description of this jira, this change is based on the following facts: (1) On each datanode only one replica of block exists. Therefore there is only one generation of a block. (2) NameNode has only one entry for a block in its blocks map. So in all the maps that you mentioned, there should be only one entry of a block per block id. In either NN or DN, there is only one entry of blockInfo or replicaInfo per block. I do not think changing the key of the Block should cause any problem to all these data structures in dfs. If there are two generations of a block in those data structures, this is an error. Whether the key contains generation stamp or not, dfs should handle this error. I agree this may break some external applications. We could put this change in the release note.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hairong, if my concerns are carefully considered, I won't veto the changes although the changes are still very dangerous.

          I suggest to keep WILDCARD_STAMP and use it for sanity check as shown below.

          //Block
            public boolean equals(Object o) {
              ...
              final Block that = (Block)o;
              if (this.generationStamp != that.generationStamp && that.generationStamp != WILDCARD_STAMP) {
                throw new IllegalArgumentException(..);
              }
              return ...
            }
          
          Show
          Tsz Wo Nicholas Sze added a comment - Hairong, if my concerns are carefully considered, I won't veto the changes although the changes are still very dangerous. I suggest to keep WILDCARD_STAMP and use it for sanity check as shown below. //Block public boolean equals( Object o) { ... final Block that = (Block)o; if ( this .generationStamp != that.generationStamp && that.generationStamp != WILDCARD_STAMP) { throw new IllegalArgumentException(..); } return ... }
          Hide
          Tsz Wo Nicholas Sze added a comment -

          For your reference, generation stamp and the change of Block.equals(..) were introduced by HADOOP-2656.

          Show
          Tsz Wo Nicholas Sze added a comment - For your reference, generation stamp and the change of Block.equals(..) were introduced by HADOOP-2656 .
          Hide
          Konstantin Shvachko added a comment -

          Most of the maps Nicholas listed above are HashMap}}s. They are based on {{Block.hash() method, which is not modified by the patch, and has never used generation stamp in calculating block's hash. I found only 4 maps, which use TreeSet<Block> or TreeMap with the Block as a key. Here they are:

          1. UnderReplicatedBlocks
          2. BlockManager.excessReplicateMap
          3. CorruptReplicasMap
          4. DatanodeDescriptor.invalidateBlocks

          Neither of them need to know about generation stamp.
          I think it is safe to make the change. We should commit it to the append branch.

          Additional comments:

          • getReplicaInfo() adds generation stamp checking. I don't think this is necessary.
          • comment // ... ignore generation stamp!!! is misleading, should be removed.
          • ReplicaInfo.setGenStamp(), getGenStamp() should rather be called setGenerationStamp(), getGenerationStamp()
          • Why does ReplicaInfo need genStamp field. Don't we always have it in Block? If we do could you please add a comment clarifying what this field actually is.
          Show
          Konstantin Shvachko added a comment - Most of the maps Nicholas listed above are HashMap}}s. They are based on {{Block.hash() method, which is not modified by the patch, and has never used generation stamp in calculating block's hash. I found only 4 maps, which use TreeSet<Block> or TreeMap with the Block as a key. Here they are: UnderReplicatedBlocks BlockManager.excessReplicateMap CorruptReplicasMap DatanodeDescriptor.invalidateBlocks Neither of them need to know about generation stamp. I think it is safe to make the change. We should commit it to the append branch. Additional comments: getReplicaInfo() adds generation stamp checking. I don't think this is necessary. comment // ... ignore generation stamp!!! is misleading, should be removed. ReplicaInfo.setGenStamp(), getGenStamp() should rather be called setGenerationStamp(), getGenerationStamp() Why does ReplicaInfo need genStamp field. Don't we always have it in Block ? If we do could you please add a comment clarifying what this field actually is.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Most of the maps Nicholas listed above are HashMap}}s. They are based on {{Block.hash() method ...

          HashMap, like any class implementing Map, also uses equals(..). See Map API.

          Show
          Tsz Wo Nicholas Sze added a comment - > Most of the maps Nicholas listed above are HashMap}}s. They are based on {{Block.hash() method ... HashMap, like any class implementing Map, also uses equals(..). See Map API .
          Hide
          Konstantin Shvachko added a comment -

          I was looking at compareTo(), which is used in TreeMap. HashMap uses equals() so yes all maps are affected.
          I still think the change makes sense because we don't want to treat the same block with different generation stamps as separate blocks.
          By the way, equals() should probably be implemented via compareTo() so that they are always in sync.

          Show
          Konstantin Shvachko added a comment - I was looking at compareTo(), which is used in TreeMap. HashMap uses equals() so yes all maps are affected. I still think the change makes sense because we don't want to treat the same block with different generation stamps as separate blocks. By the way, equals() should probably be implemented via compareTo() so that they are always in sync.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > I was looking at compareTo(), which is used in TreeMap. ...
          Even for TreeMap, it is good to implement equals(..) correctly as described by the TreeMap API,

          .. The behavior of a sorted map is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Map interface.

          > By the way, equals() should probably be implemented via compareTo() so that they are always in sync.
          Agree.

          Show
          Tsz Wo Nicholas Sze added a comment - > I was looking at compareTo(), which is used in TreeMap. ... Even for TreeMap, it is good to implement equals(..) correctly as described by the TreeMap API , .. The behavior of a sorted map is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Map interface. > By the way, equals() should probably be implemented via compareTo() so that they are always in sync. Agree.
          Hide
          Hairong Kuang added a comment -

          > Why does ReplicaInfo need genStamp field. Don't we always have it in Block? If we do could you please add a comment clarifying what this field actually is.
          Now the key is block id so the returned entry from the map may have a different generation stamp but there is no efficient way to get the key in the map so I add the generation stamp to the value field.

          Show
          Hairong Kuang added a comment - > Why does ReplicaInfo need genStamp field. Don't we always have it in Block? If we do could you please add a comment clarifying what this field actually is. Now the key is block id so the returned entry from the map may have a different generation stamp but there is no efficient way to get the key in the map so I add the generation stamp to the value field.
          Hide
          Raghu Angadi added a comment -

          I faced this problem before : you can't easily get actual block from the block info in DN. This is probably a good time to add Block to ReplicaInfo. I think it makes sense and we don't need to add genStamp and genstamp interface there.. also we don't have to keep genStamp in Block and in ReplicaInfo in-sync.

          Show
          Raghu Angadi added a comment - I faced this problem before : you can't easily get actual block from the block info in DN. This is probably a good time to add Block to ReplicaInfo. I think it makes sense and we don't need to add genStamp and genstamp interface there.. also we don't have to keep genStamp in Block and in ReplicaInfo in-sync.
          Hide
          dhruba borthakur added a comment -

          There are some advantages to using the generation stamp as part of the unique identifier for a Block object. This ensures that all code correctly identifies that blocks with different generation stamp are different blocks and can have different contents inside them. It might not be a big deal for NN data structures, especially because the NN first checks to see if a block belongs to a file before inserting it into the BlocksMap. But for external tools that use a block interface (e.g. Balancer, fsck, etc), it might be helpful for them to understand that blocks with different generation stamps are different blocks (do these utilities use the Block object at all?)

          @Raghu: > This is probably a good time to add Block to ReplicaInfo.

          If we follow Raghu's suggestion, then can we continue using the genstamp as part of the Block key?

          There are other cases, (especially during block report processing) where we would have to do wild-card lookups for a block. But the cost of these extra lookup calls might be minimal because they will be in the error-code-path only.

          Show
          dhruba borthakur added a comment - There are some advantages to using the generation stamp as part of the unique identifier for a Block object. This ensures that all code correctly identifies that blocks with different generation stamp are different blocks and can have different contents inside them. It might not be a big deal for NN data structures, especially because the NN first checks to see if a block belongs to a file before inserting it into the BlocksMap. But for external tools that use a block interface (e.g. Balancer, fsck, etc), it might be helpful for them to understand that blocks with different generation stamps are different blocks (do these utilities use the Block object at all?) @Raghu: > This is probably a good time to add Block to ReplicaInfo. If we follow Raghu's suggestion, then can we continue using the genstamp as part of the Block key? There are other cases, (especially during block report processing) where we would have to do wild-card lookups for a block. But the cost of these extra lookup calls might be minimal because they will be in the error-code-path only.
          Hide
          Hairong Kuang added a comment -

          > This is probably a good time to add Block to ReplicaInfo.
          I already redefined ReplicaInfo in HDFS-509 where ReplicaInfo extends Block. This patch provides only a temporary fix if HDFS-509 gets committed later.

          > If we follow Raghu's suggestion, can we continue using the genstamp as part of the Block key?
          In both NN and DN, blocksMap and replicasMap do not need gen stamp as part of the Block key. DN can provision read if it has a replica with a gen stamp equal to or newer than the one in the request. With the new design, if DN receives a write request with a recovery flag, it can satisfy the request if it has a replica with a gen stamp equal to or newer than the one in the write request.

          The key question is that does HDFS have any maps or sets that need to contain two instances of Block with the same block id but different gen stamps. If the answer is no, we can safely remove the gen stamp as part of the Block key.

          Show
          Hairong Kuang added a comment - > This is probably a good time to add Block to ReplicaInfo. I already redefined ReplicaInfo in HDFS-509 where ReplicaInfo extends Block. This patch provides only a temporary fix if HDFS-509 gets committed later. > If we follow Raghu's suggestion, can we continue using the genstamp as part of the Block key? In both NN and DN, blocksMap and replicasMap do not need gen stamp as part of the Block key. DN can provision read if it has a replica with a gen stamp equal to or newer than the one in the request. With the new design, if DN receives a write request with a recovery flag, it can satisfy the request if it has a replica with a gen stamp equal to or newer than the one in the write request. The key question is that does HDFS have any maps or sets that need to contain two instances of Block with the same block id but different gen stamps. If the answer is no, we can safely remove the gen stamp as part of the Block key.
          Hide
          dhruba borthakur added a comment -

          > does HDFS have any maps or sets that need to contain two instances of Block with the same block id but different gen stamps

          I agree, that this is not the case. So, let's get rid of the genstamp from the the Block key. Can we still use it in the Block.equals() method?

          Show
          dhruba borthakur added a comment - > does HDFS have any maps or sets that need to contain two instances of Block with the same block id but different gen stamps I agree, that this is not the case. So, let's get rid of the genstamp from the the Block key. Can we still use it in the Block.equals() method?
          Hide
          Konstantin Shvachko added a comment -

          > let's get rid of the genstamp from the the Block key. Can we still use it in the Block.equals() method?

          I agree the block is identified by the blockID only in all maps. This by the way should simplify life for external tools too.
          We cannot have semantically different implementations of compareTo() and equals() methods. This is what we discussed earlier with Nicholas. equals() should be implemented as compareTo() == 0. compareTo() is used by TreeMaps, while HashMaps are based on equals(), so they have to be in sync if we want the same indexing by different types of maps.

          Show
          Konstantin Shvachko added a comment - > let's get rid of the genstamp from the the Block key. Can we still use it in the Block.equals() method? I agree the block is identified by the blockID only in all maps. This by the way should simplify life for external tools too. We cannot have semantically different implementations of compareTo() and equals() methods. This is what we discussed earlier with Nicholas. equals() should be implemented as compareTo() == 0 . compareTo() is used by TreeMaps, while HashMaps are based on equals() , so they have to be in sync if we want the same indexing by different types of maps.
          Hide
          dhruba borthakur added a comment -

          Yes, sounds good to me.

          Show
          dhruba borthakur added a comment - Yes, sounds good to me.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > We cannot have semantically different implementations of compareTo() and equals() methods. ...
          We indeed could have two implementations of compareTo() since we may pass a Comparator object. See this TreeMap constructor.

          Show
          Tsz Wo Nicholas Sze added a comment - > We cannot have semantically different implementations of compareTo() and equals() methods. ... We indeed could have two implementations of compareTo() since we may pass a Comparator object. See this TreeMap constructor .
          Hide
          Konstantin Shvachko added a comment -

          You are not proposing doing it or are you?

          Show
          Konstantin Shvachko added a comment - You are not proposing doing it or are you?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I am proposing we should consider this option if it is not yet considered.

          Show
          Tsz Wo Nicholas Sze added a comment - I am proposing we should consider this option if it is not yet considered.
          Hide
          Konstantin Shvachko added a comment -

          I am confused. What do you want to use two different comparators for?

          Show
          Konstantin Shvachko added a comment - I am confused. What do you want to use two different comparators for?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > .. What do you want to use two different comparators for?
          We may keep current Block.compareTo(..) and Block.equals(..) implementations (i.e. the use id and gs) and then define a new Comparator<Block> for comparing only id.

          Show
          Tsz Wo Nicholas Sze added a comment - > .. What do you want to use two different comparators for? We may keep current Block.compareTo(..) and Block.equals(..) implementations (i.e. the use id and gs) and then define a new Comparator<Block> for comparing only id.
          Hide
          Konstantin Shvachko added a comment -

          I understand we can implement multiple comparators. But my question is what are the use cases for them? You propose two: where do you plan to use each of them?

          Show
          Konstantin Shvachko added a comment - I understand we can implement multiple comparators. But my question is what are the use cases for them? You propose two: where do you plan to use each of them?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          From my understanding, we want to change some map keys from id + gs to id. Then, we have a few options
          (1) change Block.compareTo(..) and Block.equals(..) to compare on id.
          (2) change class of the map keys from Block to something else, say BlockId, which only uses id (but not gs) to implement equals (and compareTo).
          (3) use TreeMap with Block as the key but define a new Comparator which compare Block(s) with id only.

          Current patch implements option (1). We may consider options (2) and (3).

          Show
          Tsz Wo Nicholas Sze added a comment - From my understanding, we want to change some map keys from id + gs to id. Then, we have a few options (1) change Block.compareTo(..) and Block.equals(..) to compare on id. (2) change class of the map keys from Block to something else, say BlockId, which only uses id (but not gs) to implement equals (and compareTo). (3) use TreeMap with Block as the key but define a new Comparator which compare Block(s) with id only. Current patch implements option (1). We may consider options (2) and (3).
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I forgot to say that my preference would be (2) > (3) > (1) because

          • (1) is too dangerous to get everything correct and is an incompatible change; and
          • (3) seems requiring more code changes than (2).
          Show
          Tsz Wo Nicholas Sze added a comment - I forgot to say that my preference would be (2) > (3) > (1) because (1) is too dangerous to get everything correct and is an incompatible change; and (3) seems requiring more code changes than (2).
          Hide
          Tsz Wo Nicholas Sze added a comment -

          We have a concern that whether compareTo(..) has to be consistent with equal(..). The answer is "strongly recommended but not required". See the Comparable API.

          It is strongly recommended (though not required) that natural orderings be consistent with equals. ...

          Show
          Tsz Wo Nicholas Sze added a comment - We have a concern that whether compareTo(..) has to be consistent with equal(..). The answer is "strongly recommended but not required". See the Comparable API . It is strongly recommended (though not required) that natural orderings be consistent with equals. ...
          Hide
          Konstantin Shvachko added a comment -

          > (1) is too dangerous to get everything correct

          I want to see at least one example where current the implementation (id+gs) of the block key is critical.

          > and is an incompatible change;

          I agree it is. BTW, nobody complained about changing it from id to (id+gs).

          > (2) change class of the map keys from Block to something else, say BlockId

          This is going to cost us. Currently BlocksMap entry key and value are the same object. So the key is represented by a reference only. In your proposal (2) replacing the key with BlockID it is going to be a new object, which will tale more space than a reference.
          Unless you implement BlockInfo.equals() differently. Right now it simply refers to the super class method Block.equals().
          BlockInfo is not a public api and we can change it, but I still don't know why would we want to increase the complexity of the code, except the hypothetical suggestion it may cause problems.

          Show
          Konstantin Shvachko added a comment - > (1) is too dangerous to get everything correct I want to see at least one example where current the implementation (id+gs) of the block key is critical. > and is an incompatible change; I agree it is. BTW, nobody complained about changing it from id to (id+gs). > (2) change class of the map keys from Block to something else, say BlockId This is going to cost us. Currently BlocksMap entry key and value are the same object. So the key is represented by a reference only. In your proposal (2) replacing the key with BlockID it is going to be a new object, which will tale more space than a reference. Unless you implement BlockInfo.equals() differently. Right now it simply refers to the super class method Block.equals() . BlockInfo is not a public api and we can change it, but I still don't know why would we want to increase the complexity of the code, except the hypothetical suggestion it may cause problems.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > I want to see at least one example where current the implementation (id+gs) of the block key is critical.
          I did not say that I got a good example for using id+gs.

          > (1) is too dangerous to get everything correct
          Typo: it should be "(1) is too dangerous: it is very difficult to get everything correct"
          I mean (1) involves too much. I even have a hard time to list out all the affected components in my previous comment.

          BTW, I am not against (1) and I am not the assignee of this issue. Please go ahead if you have fully considered other alternatives. I think the assignee has the freedom to choose a solution.

          Show
          Tsz Wo Nicholas Sze added a comment - > I want to see at least one example where current the implementation (id+gs) of the block key is critical. I did not say that I got a good example for using id+gs. > (1) is too dangerous to get everything correct Typo: it should be "(1) is too dangerous: it is very difficult to get everything correct" I mean (1) involves too much. I even have a hard time to list out all the affected components in my previous comment . BTW, I am not against (1) and I am not the assignee of this issue. Please go ahead if you have fully considered other alternatives. I think the assignee has the freedom to choose a solution.
          Hide
          Konstantin Shvachko added a comment -

          Here is another variant of the patch.

          • I changed Block.compareTo() to compare only the block Ids, and Block.equals() now returns compareTo() == 0.
          • I removed FSBanesystem.findBlock(id), since it is not required any more, abd pieces of the code that depended on non-equal generation stamps.
          • Found that we have two identical methods getGenerationStampFromFile() in FSDatataset, remove one, which does not have references.
          • Unlike previous patch I think it is better to keep GenerationStamp class around. It keeps generation stamp related constants and implements nextStamp(), setStamp(). I don't see why it need to be Writable.
          Show
          Konstantin Shvachko added a comment - Here is another variant of the patch. I changed Block.compareTo() to compare only the block Ids, and Block.equals() now returns compareTo() == 0 . I removed FSBanesystem.findBlock(id) , since it is not required any more, abd pieces of the code that depended on non-equal generation stamps. Found that we have two identical methods getGenerationStampFromFile() in FSDatataset, remove one, which does not have references. Unlike previous patch I think it is better to keep GenerationStamp class around. It keeps generation stamp related constants and implements nextStamp(), setStamp() . I don't see why it need to be Writable.
          Hide
          Konstantin Shvachko added a comment -

          I ran test-core and test-patch myself. It works. Need to run map-reduce tests to make sure nothing is affected by this change.

          Show
          Konstantin Shvachko added a comment - I ran test-core and test-patch myself. It works. Need to run map-reduce tests to make sure nothing is affected by this change.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12422175/blockIdAsKey.patch
          against trunk revision 825689.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/27/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/27/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/27/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/27/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422175/blockIdAsKey.patch against trunk revision 825689. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/27/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/27/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/27/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/27/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          FSNamesystem.java has code that removes a blockinfo entry from the blocksMap, changes its generation stamp, and adds it back, for example, in updatePipeline and commitBlockSynchronization. Now that the generation stamp is not the part of the key, those code are not necessary.

          Show
          Hairong Kuang added a comment - FSNamesystem.java has code that removes a blockinfo entry from the blocksMap, changes its generation stamp, and adds it back, for example, in updatePipeline and commitBlockSynchronization. Now that the generation stamp is not the part of the key, those code are not necessary.
          Hide
          Konstantin Shvachko added a comment -

          This is merging the current trunk.
          I also fixed updatePipeline() as Hairong suggested.
          I did not fix commitBlockSynchronization(). I think it should be a refactoring task for another jira.

          Show
          Konstantin Shvachko added a comment - This is merging the current trunk. I also fixed updatePipeline() as Hairong suggested. I did not fix commitBlockSynchronization(). I think it should be a refactoring task for another jira.
          Hide
          Konstantin Shvachko added a comment -

          I ran tests for HDFS and MapReduce, inculding contrib tests for both projects. Everything worked fine with the new keys. Also talked to Devaraj, who could not find a reason why MR would care about blocks' generation stamps.

          Show
          Konstantin Shvachko added a comment - I ran tests for HDFS and MapReduce, inculding contrib tests for both projects. Everything worked fine with the new keys. Also talked to Devaraj, who could not find a reason why MR would care about blocks' generation stamps.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12422973/blockIdAsKey.patch
          against trunk revision 828926.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422973/blockIdAsKey.patch against trunk revision 828926. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          +1. The patch looks good to me.

          Show
          Hairong Kuang added a comment - +1. The patch looks good to me.
          Hide
          Konstantin Shvachko added a comment -

          I just committed this.

          Show
          Konstantin Shvachko added a comment - I just committed this.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #81 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/81/)
          . Block.equals() and compareTo() compare blocks based only on block Ids, ignoring generation stamps. Contributed by Konstantin Shvachko.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #81 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/81/ ) . Block.equals() and compareTo() compare blocks based only on block Ids, ignoring generation stamps. Contributed by Konstantin Shvachko.
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #59 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/59/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #59 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/59/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #120 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/120/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #120 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/120/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/ )

            People

            • Assignee:
              Konstantin Shvachko
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development