Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6506

Newly moved block replica been invalidated and deleted in TestBalancer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.6.0
    • Component/s: balancer & mover, test
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
      https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
      from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed.

      2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159
      2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159
      2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159
      2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159
      2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159
      2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159
      2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159
      2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr
      2014-06-06 18:15:54,706 INFO  BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set
      2014-06-06 18:15:54,709 INFO  BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set
      2014-06-06 18:15:56,421 INFO  BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
      2014-06-06 18:15:57,717 INFO  BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set
      2014-06-06 18:15:57,720 INFO  BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set
      2014-06-06 18:15:57,721 INFO  BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set
      2014-06-06 18:15:57,722 INFO  BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set
      2014-06-06 18:15:57,723 INFO  BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set
      2014-06-06 18:15:59,422 INFO  BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
      2014-06-06 18:16:02,423 INFO  BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021]
      

      Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic.
      I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused this.

      1. HDFS-6506.v1.patch
        2 kB
        Binglin Chang
      2. HDFS-6506.v2.patch
        3 kB
        Binglin Chang
      3. HDFS-6506.v3.patch
        3 kB
        Binglin Chang

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Binglin Chang
              Reporter:
              Binglin Chang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development