Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8204

Mover/Balancer should not schedule two replicas to the same DN

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Balancer moves blocks between Datanode(Ver. <2.6 ).
      Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. >=2.6) .
      function

      class DBlock extends Locations<StorageGroup>
      DBlock.isLocatedOn(StorageGroup loc)
      

      is flawed, may causes 2 replicas ends in same node after running balance.

      For example:
      We have 2 nodes. Each node has two storages.
      We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
      We have a block with ONE_SSD storage policy.
      The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
      Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
      Otherwise DN1 has 2 replicas.
      --------------
      UPDATE(Thanks Tsz-wo Sze for pointing it out):

      This bug will NOT causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it.

      We see a lot of ERROR when running test.

      2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  src: /127.0.0.1:52532 dst: /127.0.0.1:59537
      org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created.
          at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:186)
          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
          at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
          at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
          at java.lang.Thread.run(Thread.java:722)
      

      The Balancer runs 5~20 times iterations in the test, before it exits.
      It's ineffecient.
      Balancer should not schedule it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration.

      Attachments

        1. HDFS-8204.001.patch
          3 kB
          Walter Su
        2. HDFS-8204.002.patch
          5 kB
          Walter Su
        3. HDFS-8204.003.patch
          5 kB
          Walter Su

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            walter.k.su Walter Su
            walter.k.su Walter Su
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment