HBase
  1. HBase
  2. HBASE-10499

In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.98.0
    • Fix Version/s: 0.99.0, 0.98.2
    • Component/s: None
    • Labels:
      None

      Description

      I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me.
      Also it is something similar to HBASE-5312 and HBASE-5568.

      Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions.
      One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side

      com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times,
              at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
              at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
              at com.yahoo.ycsb.ClientThread.run(Client.java:307)
      Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times,
              at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
              at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
              at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
              at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
              at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
              at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
              ... 2 more
      

      On one of the RS

      2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5
      ......................................................
      2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
      
      2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689
      2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868
      2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 20847
      2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 20099
      2014-02-11 09:43:04,238 INFO  [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 8677
      
      2014-02-11 10:31:21,020 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=54, maxlogs=32; forcing flush of 1 regions(s): fdbb3242d3b673bbe4790a47bc30576f
      

      I restarted another RS and there were region movements with other regions but this region stays with the RS that has this issue. One important observation is that in HRegion.internalflushCache() we need to add a debug log here

      // If nothing to flush, return and avoid logging start/stop flush.
          if (this.memstoreSize.get() <= 0) {
            return false;
          }
      

      Because we can see that the region is requsted for a flush but it does not happen and no logs related to flush are printed in the logs. so due to some reason this memstore.size() has become 0( I assume this). The earlier bugs were also due to similar reason.

      1. rs_576f.log
        98 kB
        Honghua Feng
      2. rs_4e39.log
        36 kB
        Honghua Feng
      3. master_576f.log
        4 kB
        Honghua Feng
      4. master_4e39.log
        11 kB
        Honghua Feng
      5. hbase-root-master-ip-10-157-0-229.zip
        1005 kB
        ramkrishna.s.vasudevan
      6. workloada_0.98.dat
        35 kB
        ramkrishna.s.vasudevan
      7. HBASE-10499.patch
        0.7 kB
        ramkrishna.s.vasudevan
      8. hbase-root-regionserver-ip-10-93-128-92.zip
        2.11 MB
        ramkrishna.s.vasudevan
      9. t2.dump
        164 kB
        ramkrishna.s.vasudevan
      10. t1.dump
        159 kB
        ramkrishna.s.vasudevan

        Activity

        ramkrishna.s.vasudevan created issue -
        ramkrishna.s.vasudevan made changes -
        Field Original Value New Value
        Fix Version/s 0.98.0 [ 12323143 ]
        ramkrishna.s.vasudevan made changes -
        Affects Version/s 0.98.0 [ 12323143 ]
        ramkrishna.s.vasudevan made changes -
        Attachment t1.dump [ 12628257 ]
        Attachment t2.dump [ 12628258 ]
        Attachment hbase-root-regionserver-ip-10-93-128-92.zip [ 12628259 ]
        ramkrishna.s.vasudevan made changes -
        Attachment HBASE-10499.patch [ 12628449 ]
        Andrew Purtell made changes -
        Fix Version/s 0.99.0 [ 12325675 ]
        ramkrishna.s.vasudevan made changes -
        Attachment workloada_0.98.dat [ 12628970 ]
        ramkrishna.s.vasudevan made changes -
        Attachment hbase-root-master-ip-10-157-0-229.zip [ 12629328 ]
        Honghua Feng made changes -
        Attachment master_4e39.log [ 12629339 ]
        Attachment master_576f.log [ 12629340 ]
        Attachment rs_4e39.log [ 12629341 ]
        Attachment rs_576f.log [ 12629342 ]
        Andrew Purtell made changes -
        Fix Version/s 0.98.2 [ 12326505 ]
        Fix Version/s 0.98.1 [ 12325664 ]

          People

          • Assignee:
            ramkrishna.s.vasudevan
            Reporter:
            ramkrishna.s.vasudevan
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:

              Development