Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20770

WAL cleaner logs way too much; gets clogged when lots of work to do

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0, 1.3.3, 1.4.6, 2.2.0, 2.0.2
    • Component/s: logging
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Been here before (HBASE-7214 and HBASE-19652). Testing on large cluster, Master log is in a continuous spew of logging output fililng disks. It is stuck making no progress but hard to tell because it is logging minutiae rather than a call on whats actually wrong.

      Log is full of this:

      2018-06-21 01:19:12,761 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/meta/bd49572de3914b66985fff5ea2ca7403
      2018-06-21 01:19:12,761 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/meta/fad01294c6ca421db209e89b5b97d364
      2018-06-21 01:19:12,823 WARN org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Wait more than 60000 ms for deleting hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/d3f759d0495257fc1d33ae780b634455/tiny/b72bac4036444dcf9265c7b5664fd403, exit...
      2018-06-21 01:19:12,823 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9
      2018-06-21 01:19:12,824 WARN org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Wait more than 60000 ms for deleting hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/2425053ad86823081b368e00bc471e56/tiny/6ea3cb1174434aecbc448e322e2a062c, exit...
      2018-06-21 01:19:12,824 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/big
      2018-06-21 01:19:12,824 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/tiny
      2018-06-21 01:19:12,827 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9/meta
      2018-06-21 01:19:12,844 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0
      2018-06-21 01:19:12,844 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c
      2018-06-21 01:19:12,844 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9/meta/90f21dec28d140cda48d37eeb44d37e8
      2018-06-21 01:19:12,844 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9/meta/8a4cf6410d5a4201963bc1415945f877
      2018-06-21 01:19:12,848 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c/meta
      2018-06-21 01:19:12,849 DEBUG org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0/meta
      2018-06-21 01:19:12,927 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0/meta/6043fce5761e4479819b15405183f193
      2018-06-21 01:19:12,927 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c/meta/69e6bf4650124859b2bc7ddf134be642
      2018-06-21 01:19:13,011 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0/meta/1a46700fbc434574a005c0b55879d5ed
      2018-06-21 01:19:13,011 DEBUG org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c/meta/44beca2adfb544999dd82e9cf8151be1
      2018-06-21 01:19:13,048 WARN org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Wait more than 60000 ms for deleting hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/1f60d5dcf666b1e47de3445c369b6411/tiny/29b934a9c8f74ec1a5806f3fabeee094, exit...
      

      It looks like we make no progress because we give up after 60 seconds (see above). The dir in this case has ~17k WALs in it (Cluster has hundreds of servers and has been running test for a while). When I do a listing, the count of WALs is not changing.

      There is also this output below that does not help.

      2018-06-21 00:16:05,852 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gW^6\xB5Z}\x00,1529565022477.4c27c2ef56a0aa3adae325a20d0a229b.
      2018-06-21 00:16:05,855 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gP\x10\xE2\xD1\xFD\xC2\x00,1529565022477.cee1017900682956e90e94be1e1c10da.
      2018-06-21 00:16:05,858 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gH\xC3\x8E\xEE\xA1\x07\x00,1529565022477.d4184dddf837bb3c19eb0cc0d87ccdf5.
      2018-06-21 00:16:05,861 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g6\x82=693\x80,1529565022477.87faebdd14107f09126efa169b543dd6.
      2018-06-21 00:16:05,863 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gS\xB7\x8C\xC3\xAC\x1F\x80,1529565022477.4d5a3efc33988180ad11c4eac8e64a4a.
      2018-06-21 00:16:05,866 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g^\xAB\x8A\x98\xB78\x00,1529565022477.6a6945672c19b937fc2ccde13d2aff5d.
      2018-06-21 00:16:05,869 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gmF2_p\xAE\x00,1529565022477.6607c6fef270c71c44caa1b8e69a63cb.
      2018-06-21 00:16:05,872 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gi\x9F\x88m\xC2P\x80,1529565022477.5ecbc236145f85d857ffea9ad3c1d89c.
      2018-06-21 00:16:05,875 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g[\x04\xE0\xA7\x08\xDA\x80,1529565022477.02a09a3ce49e05ea35a2ed51d306b37d.
      2018-06-21 00:16:05,878 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gp\xEC\xDCQ\x1F\x0B\x80,1529565022477.0ce748886ee50a79691c7569a17ea30a.
      2018-06-21 00:16:05,881 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g{\xE0\xDA&*$\x00,1529565022477.8d65e26b122c25f411c19555c9103b03.
      2018-06-21 00:16:05,884 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g\x86\xD4\xD7\xFB5<\x80,1529565022477.88c1deebe2e914d9f323a557d1517ae8.
      2018-06-21 00:16:05,888 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g\x95o\x7F\xC1\xEE\xB2\x80,1529565022477.1c3849e505edf7efd1391ae8da88ec09.
      2018-06-21 00:16:05,891 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g\x8E"+\xDE\x91\xF7\x80,1529565022477.bb4eb5e775869e6cc3bc95fbc661c487.
      2018-06-21 00:16:05,894 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,ge\xF8\xDE|\x13\xF3\x00,1529565022477.76afec6687b4ee782177424af54e8767.
      2018-06-21 00:16:05,897 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gt\x93\x86B\xCDi\x00,1529565022477.0dd540d73e6dd115585de4e576b771ca.
      2018-06-21 00:16:05,900 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g\x99\x16)\xB3\x9D\x10\x00,1529565022477.0dd49eaefa19cf1935f41661aa4e1dde.
      2018-06-21 00:16:05,903 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,gx:04{\xC6\x80,1529565022477.525b1aff5c2361101afce5f6c05a68b6.
      2018-06-21 00:16:05,906 DEBUG org.apache.hadoop.hbase.master.balancer.RegionLocationFinder: HDFSBlocksDistribution not found in cache for region IntegrationTestBigLinkedList,g\x9C\xBC\xD3\xA5Km\x80,1529565022477.9fbb338f7e006c02c3a84ca926749674.
      ...
      

      And this....

      2018-06-21 00:16:39,617 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 104 for processing by initSASLContext
      2018-06-21 00:16:39,617 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 0 from initSASLContext.
      2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 104 for processing by initSASLContext
      2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 50 for processing by initSASLContext
      2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 0 from initSASLContext.
      2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 50 from initSASLContext.
      2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 104 for processing by initSASLContext
      2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClient: SASL client context established. Negotiated QoP: auth
      2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 0 from initSASLContext.
      2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 50 for processing by initSASLContext
      2018-06-21 00:16:39,618 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 50 from initSASLContext.
      2018-06-21 00:16:39,619 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClient: SASL client context established. Negotiated QoP: auth
      2018-06-21 00:16:39,619 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will read input token of size 50 for processing by initSASLContext
      2018-06-21 00:16:39,619 DEBUG org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler: Will send token of size 50 from initSASLContext.
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                stack stack
                Reporter:
                stack stack
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: