Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.0.0-alpha-1, 1.4.13
-
None
-
None
-
Reviewed
Description
We usually use graceful_stop.sh from the Master to restart RegionServers. However, in some scenarios we may not have privileges to restart remote RegionServers (it uses ssh).
But we can still use graceful_stop.sh on the same host we want to restart.
In order to detect the execution at localhost, graceful_stop.sh uses /bin/hostname.
https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/bin/graceful_stop.sh#L106-L110
When RegionMover strips the host to not include it in the list of target hosts, we filter it out by checking all RegionServer hosts in the cluster:
https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L382-L384
https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L692
But the list of RegionServer hosts returned by Admin#getRegionServers are FDQN, while the hostname provided from graceful_stop.sh is not FDQN, making the comparison fail.
Same happens for branch-1 region_mover.rb, which is the place I reproduced in my environment:
https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L305
https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L175
https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L186-L192
This can be fixed just by using "/bin/hostname -f" in the graceful_stop.sh script.
Will provide patch soon.
Attachments
Issue Links
- is fixed by
-
HBASE-25663 Make graceful_stop localhostname compare match even if fqdn
- Reopened
- links to