Details
Description
Been here before (HBASE-7214 and HBASE-19652). Testing on large cluster, Master log is in a continuous spew of logging output fililng disks. It is stuck making no progress but hard to tell because it is logging minutiae rather than a call on whats actually wrong.
Log is full of this:
2018-06-21 01:19:12,761 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/meta/bd49572de3914b66985fff5ea2ca7403 2018-06-21 01:19:12,761 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/meta/fad01294c6ca421db209e89b5b97d364 2018-06-21 01:19:12,823 WARN org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Wait more than 60000 ms for deleting hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/d3f759d0495257fc1d33ae780b634455/tiny/b72bac4036444dcf9265c7b5664fd403, exit... 2018-06-21 01:19:12,823 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9 2018-06-21 01:19:12,824 WARN org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Wait more than 60000 ms for deleting hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/2425053ad86823081b368e00bc471e56/tiny/6ea3cb1174434aecbc448e322e2a062c, exit... 2018-06-21 01:19:12,824 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/big 2018-06-21 01:19:12,824 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/tiny 2018-06-21 01:19:12,827 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9/meta 2018-06-21 01:19:12,844 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0 2018-06-21 01:19:12,844 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c 2018-06-21 01:19:12,844 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9/meta/90f21dec28d140cda48d37eeb44d37e8 2018-06-21 01:19:12,844 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9/meta/8a4cf6410d5a4201963bc1415945f877 2018-06-21 01:19:12,848 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c/meta 2018-06-21 01:19:12,849 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0/meta 2018-06-21 01:19:12,927 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0/meta/6043fce5761e4479819b15405183f193 2018-06-21 01:19:12,927 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c/meta/69e6bf4650124859b2bc7ddf134be642 2018-06-21 01:19:13,011 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0/meta/1a46700fbc434574a005c0b55879d5ed 2018-06-21 01:19:13,011 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c/meta/44beca2adfb544999dd82e9cf8151be1 2018-06-21 01:19:13,048 WARN org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Wait more than 60000 ms for deleting hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/1f60d5dcf666b1e47de3445c369b6411/tiny/29b934a9c8f74ec1a5806f3fabeee094, exit...
It looks like we make no progress because we give up after 60 seconds (see above). The dir in this case has ~17k WALs in it (Cluster has hundreds of servers and has been running test for a while). When I do a listing, the count of WALs is not changing.
There is also this output below that does not help.
2018-06-21 00:16:05,852 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gW^6\xB5Z}\x00,1529565022477.4c27c2ef56a0aa3adae325a20d0a229b. 2018-06-21 00:16:05,855 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gP\x10\xE2\xD1\xFD\xC2\x00,1529565022477.cee1017900682956e90e94be1e1c10da. 2018-06-21 00:16:05,858 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gH\xC3\x8E\xEE\xA1\x07\x00,1529565022477.d4184dddf837bb3c19eb0cc0d87ccdf5. 2018-06-21 00:16:05,861 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g6\x82=693\x80,1529565022477.87faebdd14107f09126efa169b543dd6. 2018-06-21 00:16:05,863 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gS\xB7\x8C\xC3\xAC\x1F\x80,1529565022477.4d5a3efc33988180ad11c4eac8e64a4a. 2018-06-21 00:16:05,866 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g^\xAB\x8A\x98\xB78\x00,1529565022477.6a6945672c19b937fc2ccde13d2aff5d. 2018-06-21 00:16:05,869 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gmF2_p\xAE\x00,1529565022477.6607c6fef270c71c44caa1b8e69a63cb. 2018-06-21 00:16:05,872 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gi\x9F\x88m\xC2P\x80,1529565022477.5ecbc236145f85d857ffea9ad3c1d89c. 2018-06-21 00:16:05,875 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g[\x04\xE0\xA7\x08\xDA\x80,1529565022477.02a09a3ce49e05ea35a2ed51d306b37d. 2018-06-21 00:16:05,878 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gp\xEC\xDCQ\x1F\x0B\x80,1529565022477.0ce748886ee50a79691c7569a17ea30a. 2018-06-21 00:16:05,881 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g{\xE0\xDA&*$\x00,1529565022477.8d65e26b122c25f411c19555c9103b03. 2018-06-21 00:16:05,884 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g\x86\xD4\xD7\xFB5<\x80,1529565022477.88c1deebe2e914d9f323a557d1517ae8. 2018-06-21 00:16:05,888 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g\x95o\x7F\xC1\xEE\xB2\x80,1529565022477.1c3849e505edf7efd1391ae8da88ec09. 2018-06-21 00:16:05,891 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g\x8E"+\xDE\x91\xF7\x80,1529565022477.bb4eb5e775869e6cc3bc95fbc661c487. 2018-06-21 00:16:05,894 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,ge\xF8\xDE|\x13\xF3\x00,1529565022477.76afec6687b4ee782177424af54e8767. 2018-06-21 00:16:05,897 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gt\x93\x86B\xCDi\x00,1529565022477.0dd540d73e6dd115585de4e576b771ca. 2018-06-21 00:16:05,900 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g\x99\x16)\xB3\x9D\x10\x00,1529565022477.0dd49eaefa19cf1935f41661aa4e1dde. 2018-06-21 00:16:05,903 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gx:04{\xC6\x80,1529565022477.525b1aff5c2361101afce5f6c05a68b6. 2018-06-21 00:16:05,906 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g\x9C\xBC\xD3\xA5Km\x80,1529565022477.9fbb338f7e006c02c3a84ca926749674. ...
And this....
2018-06-21 00:16:39,617 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 104 for processing by initSASLContext 2018-06-21 00:16:39,617 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 0 from initSASLContext. 2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 104 for processing by initSASLContext 2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 50 for processing by initSASLContext 2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 0 from initSASLContext. 2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 50 from initSASLContext. 2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 104 for processing by initSASLContext 2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClient: SASL client context established. Negotiated QoP: auth 2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 0 from initSASLContext. 2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 50 for processing by initSASLContext 2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 50 from initSASLContext. 2018-06-21 00:16:39,619 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClient: SASL client context established. Negotiated QoP: auth 2018-06-21 00:16:39,619 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 50 for processing by initSASLContext 2018-06-21 00:16:39,619 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 50 from initSASLContext.
Attachments
Attachments
Issue Links
- links to