Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2053

Upper bound of outstanding WALs can be overrun

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.90.0
    • None
    • None

    Description

      Kevin Peterson up on hbase-user posted the following. Of interest is the link on the end which is logs of WAL rolls and removals. In once place we remove 70plus logs because the outstanding edits have moved passed the outstanding sequence numbers – so our basic WAL removal mechanism is working – but if you study the log, the tendency is steady climb in the number of logs. HLog#cleanOldLogs needs to notice such an upward tendency and work more aggressively cleaning the old in this case. Here is Kevin's note:

      n Tue, Dec 15, 2009 at 3:17 PM, Kevin Peterson <x@y.com> wrote:
      This makes some sense now. I currently have 2200 regions across 3 tables. My
      largest table accounts for about 1600 of those regions and is mostly active
      at one end of the keyspace -- our key is based on date, but data only
      roughly arrives in order. I also write to two secondary indexes, which have
      no pattern to the key at all. One of these secondary tables has 488 regions
      and the other has 96 regions.
      
      We write about 10M items per day to the main table (articles). All of these
      get written to one of the secondary indexes (article-ids). About a third get
      written to the other secondary index. Total volume of data is about 10GB /
      day written.
      
      I think the key is as you say that the regions aren't filled enough to
      flush. The articles table gets mostly written to near one end and I see
      splits happening regularly. The index tables have no pattern so the 10
      millions writes get scattered across the different regions. I've looked more
      closely at a log file (linked below), and if I forget about my main table
      (which would tend to get flushed), and look only at the indexes, this seems
      to be what's happening:
      
      1. Up to maxLogs HLogs, it doesn't do any flushes.
      2. Once it gets above maxLogs, it will start flushing one region each time
      it creates a new HLog.
      3. If the first HLog had edits for say 50 regions, it will need to flush the
      region with oldest edits 50 times before the HLog can be removed.
      
      If N is the number of regions getting written to, but not getting enough
      writes to flush on their own, then I think this converges to maxLogs + N
      logs on average. If I think of maxLogs as "number of logs to start flushing
      regions at" this makes sense.
      
      http://kdpeterson.net/paste/hbase-hadoop-regionserver-mi-prod-app35.ec2.biz360.com.log.2009-12-14
      
      

      Attachments

        1. 2053.patch
          11 kB
          Michael Stack
        2. 2053-v2.patch
          11 kB
          Michael Stack
        3. hbase-root-regionserver-server-2.log.2009-12-22.gz
          365 kB
          Billy Pearson

        Activity

          People

            Unassigned Unassigned
            stack Michael Stack
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: