Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.3.0
-
None
-
None
Description
Not yet sure how we got here, but,
2020-04-29 22:39:16,140 INFO org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: pid=308, state=RUNNABLE:SERVER_CRASH_ASSIGN_META, locked=true; ServerCrashProcedure server= host-a.example.com,16020,1588033841562, splitWal=true, meta=true found a region state=OFFLINE, location=null, table=hbase:meta, region=1588230740 which is no longer on us host-a.example.com,16020,1588033841562, give up assigning...
Assignment manager gives up on this procedure and nothing can progress. Manual intervention is necessary.
From this conditional block, it seems the regionNode location is null.
// This is possible, as when a server is dead, TRSP will fail to schedule a RemoteProcedure // to us and then try to assign the region to a new RS. And before it has updated the region // location to the new RS, we may have already called the am.getRegionsOnServer so we will // consider the region is still on us. And then before we arrive here, the TRSP could have // updated the region location, or even finished itself, so the region is no longer on us // any more, we should not try to assign it again. Please see HBASE-23594 for more details. if (!serverName.equals(regionNode.getRegionLocation())) { LOG.info("{} found a region {} which is no longer on us {}, give up assigning...", this, regionNode, serverName); continue; }
Attachments
Issue Links
- relates to
-
HBASE-23594 Procedure stuck due to region happen to recorded on two servers.
- Resolved