HBase
  1. HBase
  2. HBASE-10070 HBase read high-availability using timeline-consistent region replicas
  3. HBASE-10704

BaseLoadBalancer#roundRobinAssignment() may add same region to assignment plan multiple times

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.99.0, hbase-10070
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I noticed the following exception in some unit tests:

      2014-03-09 03:38:13,523 WARN  [s111-s2.cs1cloud.internal,57347,1394350359795-GeneralBulkAssigner-2] master.GeneralBulkAssigner$SingleServerBulkAssigner(232): Failed bulking assigning 18 region(s) to s111.internal,46094,1394350360588, and continue to bulk assign others
      java.lang.NullPointerException
      	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1505)
      	at org.apache.hadoop.hbase.master.GeneralBulkAssigner$SingleServerBulkAssigner.run(GeneralBulkAssigner.java:228)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:722)
      

      Here is related code:

                Lock lock = locks.remove(encodedName);
                lock.unlock();
      

      lock was null due to BaseLoadBalancer#roundRobinAssignment() adding same region to assignment plan multiple times.

      This happens in computing lastFewRegions where cluster.wouldLowerAvailability() returns true.

      1. 0018-HBASE-10704-BaseLoadBalancer-roundRobinAssignment-ma.patch
        2 kB
        Enis Soztutar
      2. 10704-v1.txt
        2 kB
        Ted Yu
      3. 10704-v2.txt
        1 kB
        Ted Yu
      4. hbase-10704_v3.patch
        1 kB
        Enis Soztutar

        Activity

        Hide
        Ted Yu added a comment -

        Patch v1 changes lastFewRegions to a Set.

        Balancer related tests pass.

        Show
        Ted Yu added a comment - Patch v1 changes lastFewRegions to a Set. Balancer related tests pass.
        Hide
        Devaraj Das added a comment -

        Ted Yu, we shouldn't end up in a situation where the balancer adds the same region multiple times.. Do you know what led to this situation in the first place.

        Show
        Devaraj Das added a comment - Ted Yu , we shouldn't end up in a situation where the balancer adds the same region multiple times.. Do you know what led to this situation in the first place.
        Hide
        Ted Yu added a comment -

        I discovered the NullPointerException when I was debugging a unit test where table is created with:

            hdt.setRegionReplication(3);
        

        I traced the duplicate regions to BaseLoadBalancer#roundRobinAssignment()

        Show
        Ted Yu added a comment - I discovered the NullPointerException when I was debugging a unit test where table is created with: hdt.setRegionReplication(3); I traced the duplicate regions to BaseLoadBalancer#roundRobinAssignment()
        Hide
        Enis Soztutar added a comment -

        The patch sidelines the problem, but does not fix the root cause. It will be good if we can identify the actual root cause. Do you have the full logs from the test?

        Show
        Enis Soztutar added a comment - The patch sidelines the problem, but does not fix the root cause. It will be good if we can identify the actual root cause. Do you have the full logs from the test?
        Hide
        Ted Yu added a comment -

        The cause for duplicate entries was that region deemed to lower availability would be inserted more than once by the following loop:

              for (int j = 0; j < numServers; j++) { // try all servers one by one
        

        Patch v2 provides a simpler fix.

        Looks like the above loop is not needed.

        Show
        Ted Yu added a comment - The cause for duplicate entries was that region deemed to lower availability would be inserted more than once by the following loop: for ( int j = 0; j < numServers; j++) { // try all servers one by one Patch v2 provides a simpler fix. Looks like the above loop is not needed.
        Hide
        Enis Soztutar added a comment -

        in v2, we do not execute the inner for loop at all, since we exit after the first try no matter what.
        Please check out v3, which should fix the problem.

        Show
        Enis Soztutar added a comment - in v2, we do not execute the inner for loop at all, since we exit after the first try no matter what. Please check out v3, which should fix the problem.
        Hide
        Devaraj Das added a comment -

        +1

        Show
        Devaraj Das added a comment - +1
        Hide
        Enis Soztutar added a comment -

        Attaching rebased patch for master that is committed

        Show
        Enis Soztutar added a comment - Attaching rebased patch for master that is committed
        Hide
        Enis Soztutar added a comment -

        Committed to master as part of hbase-10070 branch merge

        Show
        Enis Soztutar added a comment - Committed to master as part of hbase-10070 branch merge
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK #5245 (See https://builds.apache.org/job/HBase-TRUNK/5245/)
        HBASE-10704 BaseLoadBalancer#roundRobinAssignment() may add same region to assignment plan multiple times (enis: rev 55a7c872948c583dcbf4c82ebca047dc21e03b98)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5245 (See https://builds.apache.org/job/HBase-TRUNK/5245/ ) HBASE-10704 BaseLoadBalancer#roundRobinAssignment() may add same region to assignment plan multiple times (enis: rev 55a7c872948c583dcbf4c82ebca047dc21e03b98) hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
        Hide
        Enis Soztutar added a comment -

        Closing this issue after 0.99.0 release.

        Show
        Enis Soztutar added a comment - Closing this issue after 0.99.0 release.

          People

          • Assignee:
            Ted Yu
            Reporter:
            Ted Yu
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development