Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-613

TestBalancer and TestBlockTokenWithDFS fail Balancer assert

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Running TestBalancer with asserts on. The asserts in Balancer.chooseNode() is triggered and the test fails. We do not see it in the builds because asserts are off there. So either the assert is irrelevant or there is another bug in the Balancer code.

      1. hdfs-613.txt
        1.0 kB
        Todd Lipcon

        Activity

        Konstantin Shvachko created issue -
        Hide
        Konstantin Shvachko added a comment -

        This is the assertion:
        AssertionError: Mismatched number of datanodes

        It is in Balancer.chooseNode() method wthout parameters.

        Show
        Konstantin Shvachko added a comment - This is the assertion: AssertionError: Mismatched number of datanodes It is in Balancer.chooseNode() method wthout parameters.
        Hide
        dhruba borthakur added a comment -

        Hi Konstantin, is it possible to run all unit tests with asserts on? If so, what is the exact command that I can run on the command line to run all unit tests with assert's switched on?

        Show
        dhruba borthakur added a comment - Hi Konstantin, is it possible to run all unit tests with asserts on? If so, what is the exact command that I can run on the command line to run all unit tests with assert's switched on?
        Tom White made changes -
        Field Original Value New Value
        Fix Version/s 0.21.0 [ 12314046 ]
        Hide
        Eli Collins added a comment -

        They're on by default now. I didn't see this one when enabling asserts in hdfs because in the tests it gets reported as a junit assertion error instead of a java.lang one:

        Mismatched number of datanodes
        junit.framework.AssertionFailedError: Mismatched number of datanodes

        Here's the assert that's firing:

           assert (datanodes.size() ==
              overUtilizedDatanodes.size()+underUtilizedDatanodes.size()+
              aboveAvgUtilizedDatanodes.size()+belowAvgUtilizedDatanodes.size()+
              sources.size()+targets.size())
              : "Mismatched number of datanodes";
        
        Show
        Eli Collins added a comment - They're on by default now. I didn't see this one when enabling asserts in hdfs because in the tests it gets reported as a junit assertion error instead of a java.lang one: Mismatched number of datanodes junit.framework.AssertionFailedError: Mismatched number of datanodes Here's the assert that's firing: assert (datanodes.size() == overUtilizedDatanodes.size()+underUtilizedDatanodes.size()+ aboveAvgUtilizedDatanodes.size()+belowAvgUtilizedDatanodes.size()+ sources.size()+targets.size()) : "Mismatched number of datanodes";
        Eli Collins made changes -
        Fix Version/s 0.22.0 [ 12314241 ]
        Affects Version/s 0.22.0 [ 12314241 ]
        Affects Version/s 0.21.0 [ 12314046 ]
        Hide
        Eli Collins added a comment -

        TestBlockTokenWithDFS fails for the same reason.

        Show
        Eli Collins added a comment - TestBlockTokenWithDFS fails for the same reason.
        Eli Collins made changes -
        Summary TestBalancer fails with -ea option. TestBalancer and TestBlockTokenWithDFS fail Balancer assert
        Todd Lipcon made changes -
        Assignee Todd Lipcon [ tlipcon ]
        Hide
        Todd Lipcon added a comment -

        I think the assert itself is at fault here. It assumes that after datanodes have been moved to the sources/targets list, they're not left in the underUtilized list. This isn't necessarily the case, with rounding error - they might have excess "move quota" so still be in the list.

        This patch changes the assert to simply check that the sum of sources and targets doesn't add up to more than the number of DNs. This fixes both TestBlockTokenWithDFS and TestBalancer.

        Show
        Todd Lipcon added a comment - I think the assert itself is at fault here. It assumes that after datanodes have been moved to the sources/targets list, they're not left in the underUtilized list. This isn't necessarily the case, with rounding error - they might have excess "move quota" so still be in the list. This patch changes the assert to simply check that the sum of sources and targets doesn't add up to more than the number of DNs. This fixes both TestBlockTokenWithDFS and TestBalancer.
        Todd Lipcon made changes -
        Attachment hdfs-613.txt [ 12465652 ]
        Todd Lipcon made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Konstantin Boudnik added a comment -

        Looking into this separately I have came to the pretty much the same conclusion: the assert in question is incorrect.

        +1 on Todd's patch.

        Show
        Konstantin Boudnik added a comment - Looking into this separately I have came to the pretty much the same conclusion: the assert in question is incorrect. +1 on Todd's patch.
        Hide
        Konstantin Boudnik added a comment -

        I have just committed this to trunk and 0.22. Thanks Todd!

        Show
        Konstantin Boudnik added a comment - I have just committed this to trunk and 0.22. Thanks Todd!
        Konstantin Boudnik made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/ )
        Konstantin Shvachko made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        451d 2h 44m 1 Todd Lipcon 07/Dec/10 01:59
        Patch Available Patch Available Resolved Resolved
        2d 17h 31m 1 Konstantin Boudnik 09/Dec/10 19:30
        Resolved Resolved Closed Closed
        367d 10h 48m 1 Konstantin Shvachko 12/Dec/11 06:18

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Konstantin Shvachko
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development