[HBASE-10499] In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.98.0
Fix Version/s: 1.0.0, 0.98.10, 1.1.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me.
Also it is something similar to ~~HBASE-5312~~ and ~~HBASE-5568~~.

Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions.
One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side

com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times,
        at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
        at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
        at com.yahoo.ycsb.ClientThread.run(Client.java:307)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times,
        at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
        at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
        at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
        at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
        at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
        ... 2 more

On one of the RS

2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5
......................................................
2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39

2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689
2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868
2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 20847
2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 20099
2014-02-11 09:43:04,238 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 8677

2014-02-11 10:31:21,020 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=54, maxlogs=32; forcing flush of 1 regions(s): fdbb3242d3b673bbe4790a47bc30576f

I restarted another RS and there were region movements with other regions but this region stays with the RS that has this issue. One important observation is that in HRegion.internalflushCache() we need to add a debug log here

// If nothing to flush, return and avoid logging start/stop flush.
    if (this.memstoreSize.get() <= 0) {
      return false;
    }

Because we can see that the region is requsted for a flush but it does not happen and no logs related to flush are printed in the logs. so due to some reason this memstore.size() has become 0( I assume this). The earlier bugs were also due to similar reason.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

10499-0.98.txt
23/Jan/15 21:56
4 kB
Ted Yu
10499-1.0.txt
23/Jan/15 21:42
3 kB
Ted Yu
10499-v2.txt
17/Jan/15 03:32
2 kB
Ted Yu
10499-v3.txt
17/Jan/15 04:35
3 kB
Ted Yu
10499-v4.txt
17/Jan/15 05:03
3 kB
Ted Yu
10499-v6.txt
22/Jan/15 18:05
3 kB
Ted Yu
10499-v6.txt
22/Jan/15 15:42
3 kB
Ted Yu
10499-v7.txt
22/Jan/15 20:26
3 kB
Ted Yu
10499-v8.txt
23/Jan/15 05:53
3 kB
Ted Yu
compaction-queue.png
28/Jul/14 15:36
29 kB
Matt Kapilevich
HBASE-10499_v5.patch
19/Jan/15 07:05
1 kB
ramkrishna.s.vasudevan
HBASE-10499.patch
12/Feb/14 07:30
0.7 kB
ramkrishna.s.vasudevan
hbase-root-master-ip-10-157-0-229.zip
17/Feb/14 07:51
1005 kB
ramkrishna.s.vasudevan
hbase-root-regionserver-ip-10-93-128-92.zip
11/Feb/14 17:16
2.11 MB
ramkrishna.s.vasudevan
master_4e39.log
17/Feb/14 08:51
11 kB
Honghua Feng
master_576f.log
17/Feb/14 08:51
4 kB
Honghua Feng
rs_4e39.log
17/Feb/14 08:51
36 kB
Honghua Feng
rs_576f.log
17/Feb/14 08:51
98 kB
Honghua Feng
t1.dump
11/Feb/14 17:16
159 kB
ramkrishna.s.vasudevan
t2.dump
11/Feb/14 17:16
164 kB
ramkrishna.s.vasudevan
workloada_0.98.dat
14/Feb/14 08:53
35 kB
ramkrishna.s.vasudevan

In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException

Details

Description

Attachments

Attachments

Activity

People

Dates