Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-12440

Region may remain offline on clean startup under certain race condition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.98.8, 0.99.2
    • Region Assignment
    • None
    • Reviewed

    Description

      Saw this in prod some time back with zk assignment
      On clean startup, while master was doing bulk assign while one of the region servers dies. The bulk assigner then tried to assign it individually using AssignCallable. The AssignCallable does a forceStateToOffline() and skips assigning as it wants the SSH to do the assignment

      2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] : Offline sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., no need to unassign since it's on a dead server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
      2014-10-16 16:05:23,593  INFO master.RegionStates [AM.-pool1-t1] : Transition {1f1620174d2542fe7d5b034f3311c3a8 state=PENDING_OPEN, ts=1413475519482, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} to {1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016}
      2014-10-16 16:05:23,598  INFO master.AssignmentManager [AM.-pool1-t1] : Skip assigning sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., it is on a dead but not processed yet server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
      

      But the SSH wont assign as the region is offline but not in transition

      2014-10-16 16:05:24,606  INFO handler.ServerShutdownHandler [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Reassigning 0 region(s) that gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying (and 0 regions(s) that were opening on this server)
      2014-10-16 16:05:24,606 DEBUG master.DeadServer [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Finished processing gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
      

      In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH may try to assign the region. But as they go through lock, only one will succeed and doesn't seem to be an issue.

      Attachments

        1. HBASE-12440-0.98_v2.patch
          10 kB
          Virag Kothari
        2. HBASE-12440-branch-1.patch
          11 kB
          Virag Kothari
        3. HBASE-12440-0.98.patch
          11 kB
          Virag Kothari

        Activity

          People

            virag Virag Kothari
            virag Virag Kothari
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: