[HBASE-24293] Assignment manager should never give up assigning meta - ASF JIRA

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: master, Region Assignment
Labels:
None

Description

Not yet sure how we got here, but,

2020-04-29 22:39:16,140 INFO org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: pid=308, state=RUNNABLE:SERVER_CRASH_ASSIGN_META, locked=true; ServerCrashProcedure server= host-a.example.com,16020,1588033841562, splitWal=true, meta=true found a region state=OFFLINE, location=null, table=hbase:meta, region=1588230740 which is no longer on us host-a.example.com,16020,1588033841562, give up assigning...

Assignment manager gives up on this procedure and nothing can progress. Manual intervention is necessary.

From this conditional block, it seems the regionNode location is null.

        // This is possible, as when a server is dead, TRSP will fail to schedule a RemoteProcedure
        // to us and then try to assign the region to a new RS. And before it has updated the region
        // location to the new RS, we may have already called the am.getRegionsOnServer so we will
        // consider the region is still on us. And then before we arrive here, the TRSP could have
        // updated the region location, or even finished itself, so the region is no longer on us
        // any more, we should not try to assign it again. Please see HBASE-23594 for more details.
        if (!serverName.equals(regionNode.getRegionLocation())) {
          LOG.info("{} found a region {} which is no longer on us {}, give up assigning...", this,
            regionNode, serverName);
          continue;
        }