HBase
  1. HBase
  2. HBASE-1394

Uploads sometimes fall to 0 requests/second (Binding up on HLog#append?)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None

      Description

      Trying to figure why rate sometimes goes to zero.

      Studying the reginoserver, HLog#append looks like a possible culprit.

      "IPC Server handler 7 on 60021" daemon prio=10 tid=0x000000004057dc00 nid=0x1bc4 in Object.wait() [0x0000000043393000..0x0000000043393b80]
         java.lang.Thread.State: WAITING (on object monitor)
              at java.lang.Object.wait(Native Method)
              at java.lang.Object.wait(Object.java:485)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2964)
              - locked <0x00007f9e3e449ff0> (a java.util.LinkedList)
              - locked <0x00007f9e3e449e18> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
              at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
              at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
              at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
              - locked <0x00007f9e3e449e18> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
              at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
              at java.io.DataOutputStream.write(DataOutputStream.java:90)
              - locked <0x00007f9e434e5588> (a org.apache.hadoop.fs.FSDataOutputStream)
              at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1020)
              - locked <0x00007f9e434e55c0> (a org.apache.hadoop.io.SequenceFile$Writer)
              at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:984)
              - locked <0x00007f9e434e55c0> (a org.apache.hadoop.io.SequenceFile$Writer)
              at org.apache.hadoop.hbase.regionserver.HLog.doWrite(HLog.java:565)
              at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:521)
              - locked <0x00007f9dfa376f70> (a java.lang.Object)
              at org.apache.hadoop.hbase.regionserver.HRegion.update(HRegion.java:1777)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1348)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1289)
              at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1727)
              at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:642)
              at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:911)
      
      1. loghowlongittookappendinghlog.patch
        0.9 kB
        stack
      2. 1394.patch
        21 kB
        stack

        Issue Links

          Activity

          Hide
          stack added a comment -

          Commited v2 after testing up on our little cluster; logs are rotated more frequently now. Fixed up logs too a bit so had more info and referred to hlog instead of just 'log' so bit clearer what message is about.

          Show
          stack added a comment - Commited v2 after testing up on our little cluster; logs are rotated more frequently now. Fixed up logs too a bit so had more info and referred to hlog instead of just 'log' so bit clearer what message is about.
          Hide
          stack added a comment -

          Patch that:

          + removes being able to set compression on commit log
          + changes the rotation trigger from being # of edits to instead being size-based; we rotate is we exceed 90% of blocksize
          + added ability to disable logging
          + Added write time to HLogKey in prep. for when we want to do threaded splitting of logs

          Needs testing.

          Show
          stack added a comment - Patch that: + removes being able to set compression on commit log + changes the rotation trigger from being # of edits to instead being size-based; we rotate is we exceed 90% of blocksize + added ability to disable logging + Added write time to HLogKey in prep. for when we want to do threaded splitting of logs Needs testing.
          Hide
          stack added a comment -

          Looking at 1008, its multiple writers, single reader thread. Numbers posted in issue by jdcyans look good. I'm thinking we should go with that for 0.20.0. Rewrite of log splitting where we have multiple readers, multiple writers and writers write multiple logs down into Stores can wait. Meantime, I'll make the change here in HKeyLog to add write time. Will also set upper size bound on logs rather than have number of edits since that'll make splitting less likely OOMEing.

          Show
          stack added a comment - Looking at 1008, its multiple writers, single reader thread. Numbers posted in issue by jdcyans look good. I'm thinking we should go with that for 0.20.0. Rewrite of log splitting where we have multiple readers, multiple writers and writers write multiple logs down into Stores can wait. Meantime, I'll make the change here in HKeyLog to add write time. Will also set upper size bound on logs rather than have number of edits since that'll make splitting less likely OOMEing.
          Hide
          stack added a comment -

          So, after remove the 1MB block size, log writing is probably fast enough again.

          I moved the refactoring of log splitting over to hbase-1008.

          That leaves:

          + Keeping logs under block size so we don't have edits have to ride over block transition.
          + Add ability to disable logging so can see what its like running fast
          + Being clear that commit logs can't be compressed

          After doing above, I'll close this issue.

          Show
          stack added a comment - So, after remove the 1MB block size, log writing is probably fast enough again. I moved the refactoring of log splitting over to hbase-1008. That leaves: + Keeping logs under block size so we don't have edits have to ride over block transition. + Add ability to disable logging so can see what its like running fast + Being clear that commit logs can't be compressed After doing above, I'll close this issue.
          Hide
          stack added a comment -

          hbase-1413 removes the 1MB block size. Smaller blocks don't matter. We still lose it all. Removing this restores logging speed in that its no longer common seeing appends take > 1second.

          Show
          stack added a comment - hbase-1413 removes the 1MB block size. Smaller blocks don't matter. We still lose it all. Removing this restores logging speed in that its no longer common seeing appends take > 1second.
          Hide
          stack added a comment -

          This seems to be main reason logging is slow:

            <property>
              <name>hbase.regionserver.hlog.blocksize</name>
              <value>1048576</value>
              <description>Block size for HLog files. To minimize potential data loss,
              the size should be (avg key length) * (avg value length) * flushlogentries.
              Default 1MB.
              </description>
            </property>
          

          If I put the block size up and up the sequence file buffer size from default 4k to 16k, I no longer see appends taking > 1second.

          Show
          stack added a comment - This seems to be main reason logging is slow: <property> <name>hbase.regionserver.hlog.blocksize</name> <value>1048576</value> <description>Block size for HLog files. To minimize potential data loss, the size should be (avg key length) * (avg value length) * flushlogentries. Default 1MB. </description> </property> If I put the block size up and up the sequence file buffer size from default 4k to 16k, I no longer see appends taking > 1second.
          Hide
          stack added a comment -

          Thinking yet more – with an idea from Ryan – then I think we can do pooled commit log writing AND have the split run multithreaded. Here's how:

          Put up pools of HLog commit log writers. Say 3 as default. Round-robin edits. ADD to the HLogKey a timestamp. The timestamp will be the time at which we added the edit to the log. Logs are rotated off as they fill.

          On crash, master starts up as many threads as there are log files up to some limit. Each thread splits its log writing a new one per Region (though looks like it should do per Store given how stuff is currently processed). Being multithreaded, the split should run faster.

          When the region comes on line, it'll have a bunch of files to process – as many as there were pool writers. It opens all of them and then applies them to the memcache respecting adding that with the lowest HLogKey timestamp first (Ryan's idea).

          This should speed our writing and speed recovery on crash.

          Show
          stack added a comment - Thinking yet more – with an idea from Ryan – then I think we can do pooled commit log writing AND have the split run multithreaded. Here's how: Put up pools of HLog commit log writers. Say 3 as default. Round-robin edits. ADD to the HLogKey a timestamp. The timestamp will be the time at which we added the edit to the log. Logs are rotated off as they fill. On crash, master starts up as many threads as there are log files up to some limit. Each thread splits its log writing a new one per Region (though looks like it should do per Store given how stuff is currently processed). Being multithreaded, the split should run faster. When the region comes on line, it'll have a bunch of files to process – as many as there were pool writers. It opens all of them and then applies them to the memcache respecting adding that with the lowest HLogKey timestamp first (Ryan's idea). This should speed our writing and speed recovery on crash.
          Hide
          stack added a comment -

          Thinking more, the log split involves reading in a regionservers commit logs and dividing up the edits by region. I think we can have an HLog that opens a pool of N commit logs now with the logger distributing edits over the pool of log writers. What it would mean is that edits while written in order, as we spray them across N files, they could be processed out of order as we read them in; e.g. If 3 logs in the pool, let edit A go into file 1, edit B into file 2 and edit C into file 3 and so one. Since our current basic split process reads a file at a time, we'd read file 1 and find edits A, then D, and so on. Processing these files, serially, we'd write the region-scoped edits into new SequenceFiles – not sorted MapFiles or hfiles – of one per region. These files are then picked up on region open and the edits inserted into memcache (We read the log on the open of each individual store – which seems profligate). The insertion into memcache will put them back into the 'right' order – newer edits will sort before older, etc. The only place I see there possibly being a problem is if you insert a KV with one value and then afterward insert a KV with exact same r/c/ts coordinates but with different value. Its possible that this ordering will not be honored on crash. Is that OK?

          Show
          stack added a comment - Thinking more, the log split involves reading in a regionservers commit logs and dividing up the edits by region. I think we can have an HLog that opens a pool of N commit logs now with the logger distributing edits over the pool of log writers. What it would mean is that edits while written in order, as we spray them across N files, they could be processed out of order as we read them in; e.g. If 3 logs in the pool, let edit A go into file 1, edit B into file 2 and edit C into file 3 and so one. Since our current basic split process reads a file at a time, we'd read file 1 and find edits A, then D, and so on. Processing these files, serially, we'd write the region-scoped edits into new SequenceFiles – not sorted MapFiles or hfiles – of one per region. These files are then picked up on region open and the edits inserted into memcache (We read the log on the open of each individual store – which seems profligate). The insertion into memcache will put them back into the 'right' order – newer edits will sort before older, etc. The only place I see there possibly being a problem is if you insert a KV with one value and then afterward insert a KV with exact same r/c/ts coordinates but with different value. Its possible that this ordering will not be honored on crash. Is that OK?
          Hide
          stack added a comment -

          Was thinking the log should be compressed. Now I think otherwise. If gzip for instance, after the header, data is compressed in "blocks" of 32k (IIRC). That means that since the gzip block is not tied off properly, we will lose up to 32k edits regardless of whether we hdfs sync.

          Show
          stack added a comment - Was thinking the log should be compressed. Now I think otherwise. If gzip for instance, after the header, data is compressed in "blocks" of 32k (IIRC). That means that since the gzip block is not tied off properly, we will lose up to 32k edits regardless of whether we hdfs sync.
          Hide
          stack added a comment -

          TODO: Add disabling the commit log. Bulk uploaders might appreciate this (Ryan suggestion).

          I tried running without logging to see what would happen. The behavior of HRegionServer changes drastically from walk-in-the-park dawdling to sprint/rest/sprint/rest, etc. The rate when its flowing is way up, maybe 10x but then we quickly run up against the flusher; all is paused because memory is saturated and flushing is taking too long.

          It also seems that compactions are overrun – possibly because of HBASE-1058:

          [stack@aa0-000-13 ~]$ grep 'Too many s' trunk/logs/hbase-stack-regionserver-aa0-000-13.u.powerset.com.log|grep 1242073708841
          2009-05-11 20:29:26,931 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 5, waiting
          2009-05-11 20:29:36,388 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 5, waiting
          2009-05-11 20:29:41,234 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 6, waiting
          2009-05-11 20:29:47,664 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 6, waiting
          2009-05-11 20:29:52,046 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 7, waiting
          2009-05-11 20:30:11,866 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 8, waiting
          2009-05-11 20:30:17,860 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 7, waiting
          2009-05-11 20:30:36,076 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 9, waiting
          2009-05-11 20:30:55,493 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 8, waiting
          2009-05-11 20:30:59,658 [IPC Server handler 9 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 8, waiting
          2009-05-11 20:31:11,759 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 10, waiting
          2009-05-11 20:32:06,046 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 11, waiting
          2009-05-11 20:32:31,661 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 12, waiting
          2009-05-11 20:32:51,110 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 13, waiting
          2009-05-11 20:33:18,677 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 14, waiting
          2009-05-11 20:33:47,818 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 15, waiting
          2009-05-11 20:34:01,475 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 9, waiting
          2009-05-11 20:34:18,966 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 16, waiting
          2009-05-11 20:34:23,467 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 10, waiting
          2009-05-11 20:34:46,749 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 17, waiting
          2009-05-11 20:35:15,330 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 18, waiting
          2009-05-11 20:35:41,234 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 19, waiting
          2009-05-11 20:35:41,671 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 11, waiting
          2009-05-11 20:36:11,554 [IPC Server handler 0 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 12, waiting
          2009-05-11 20:36:23,204 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 20, waiting
          2009-05-11 20:37:00,864 [IPC Server handler 4 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 13, waiting
          2009-05-11 20:37:29,841 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 14, waiting
          2009-05-11 20:37:42,480 [IPC Server handler 5 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 14, waiting
          2009-05-11 20:37:46,068 [IPC Server handler 5 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 21, waiting
          2009-05-11 20:38:09,668 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 15, waiting
          2009-05-11 20:38:26,761 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 16, waiting
          2009-05-11 20:38:38,506 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 22, waiting
          2009-05-11 20:38:54,858 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 17, waiting
          2009-05-11 20:39:36,309 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 23, waiting
          2009-05-11 20:40:30,278 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 24, waiting
          2009-05-11 20:40:39,268 [IPC Server handler 4 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 25, waiting
          2009-05-11 20:41:12,208 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 26, waiting
          2009-05-11 20:41:32,876 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 27, waiting
          2009-05-11 20:42:03,696 [IPC Server handler 2 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 28, waiting
          2009-05-11 20:42:35,004 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 29, waiting
          2009-05-11 20:43:21,116 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 30, waiting
          2009-05-11 20:44:01,246 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 31, waiting
          2009-05-11 20:44:23,626 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 32, waiting
          2009-05-11 20:44:47,639 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 33, waiting
          2009-05-11 20:45:29,544 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 34, waiting
          2009-05-11 20:45:53,016 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 35, waiting
          2009-05-11 20:46:00,022 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 35, waiting
          2009-05-11 20:46:24,341 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 36, waiting
          2009-05-11 20:47:11,688 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 37, waiting
          2009-05-11 20:47:33,468 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 37, waiting
          2009-05-11 20:48:07,649 [IPC Server handler 4 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 38, waiting
          2009-05-11 20:48:40,982 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 39, waiting
          2009-05-11 20:49:22,073 [IPC Server handler 8 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 40, waiting
          2009-05-11 20:49:22,081 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 40, waiting
          2009-05-11 20:49:49,418 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 41, waiting
          2009-05-11 20:50:27,648 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 42, waiting
          2009-05-11 20:50:55,221 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 43, waiting
          

          TODO: Make issues for above after its easy to create because commit log can be disabled.

          Show
          stack added a comment - TODO: Add disabling the commit log. Bulk uploaders might appreciate this (Ryan suggestion). I tried running without logging to see what would happen. The behavior of HRegionServer changes drastically from walk-in-the-park dawdling to sprint/rest/sprint/rest, etc. The rate when its flowing is way up, maybe 10x but then we quickly run up against the flusher; all is paused because memory is saturated and flushing is taking too long. It also seems that compactions are overrun – possibly because of HBASE-1058 : [stack@aa0-000-13 ~]$ grep 'Too many s' trunk/logs/hbase-stack-regionserver-aa0-000-13.u.powerset.com.log|grep 1242073708841 2009-05-11 20:29:26,931 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 5, waiting 2009-05-11 20:29:36,388 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 5, waiting 2009-05-11 20:29:41,234 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 6, waiting 2009-05-11 20:29:47,664 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 6, waiting 2009-05-11 20:29:52,046 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 7, waiting 2009-05-11 20:30:11,866 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 8, waiting 2009-05-11 20:30:17,860 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 7, waiting 2009-05-11 20:30:36,076 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 9, waiting 2009-05-11 20:30:55,493 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 8, waiting 2009-05-11 20:30:59,658 [IPC Server handler 9 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 8, waiting 2009-05-11 20:31:11,759 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 10, waiting 2009-05-11 20:32:06,046 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 11, waiting 2009-05-11 20:32:31,661 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 12, waiting 2009-05-11 20:32:51,110 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 13, waiting 2009-05-11 20:33:18,677 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 14, waiting 2009-05-11 20:33:47,818 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 15, waiting 2009-05-11 20:34:01,475 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 9, waiting 2009-05-11 20:34:18,966 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 16, waiting 2009-05-11 20:34:23,467 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 10, waiting 2009-05-11 20:34:46,749 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 17, waiting 2009-05-11 20:35:15,330 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 18, waiting 2009-05-11 20:35:41,234 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 19, waiting 2009-05-11 20:35:41,671 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 11, waiting 2009-05-11 20:36:11,554 [IPC Server handler 0 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 12, waiting 2009-05-11 20:36:23,204 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 20, waiting 2009-05-11 20:37:00,864 [IPC Server handler 4 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 13, waiting 2009-05-11 20:37:29,841 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 14, waiting 2009-05-11 20:37:42,480 [IPC Server handler 5 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 14, waiting 2009-05-11 20:37:46,068 [IPC Server handler 5 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 21, waiting 2009-05-11 20:38:09,668 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 15, waiting 2009-05-11 20:38:26,761 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 16, waiting 2009-05-11 20:38:38,506 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 22, waiting 2009-05-11 20:38:54,858 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0651581333,1242073708841: 17, waiting 2009-05-11 20:39:36,309 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 23, waiting 2009-05-11 20:40:30,278 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 24, waiting 2009-05-11 20:40:39,268 [IPC Server handler 4 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 25, waiting 2009-05-11 20:41:12,208 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 26, waiting 2009-05-11 20:41:32,876 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 27, waiting 2009-05-11 20:42:03,696 [IPC Server handler 2 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 28, waiting 2009-05-11 20:42:35,004 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 29, waiting 2009-05-11 20:43:21,116 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 30, waiting 2009-05-11 20:44:01,246 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 31, waiting 2009-05-11 20:44:23,626 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 32, waiting 2009-05-11 20:44:47,639 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 33, waiting 2009-05-11 20:45:29,544 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 34, waiting 2009-05-11 20:45:53,016 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 35, waiting 2009-05-11 20:46:00,022 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 35, waiting 2009-05-11 20:46:24,341 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 36, waiting 2009-05-11 20:47:11,688 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 37, waiting 2009-05-11 20:47:33,468 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 37, waiting 2009-05-11 20:48:07,649 [IPC Server handler 4 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 38, waiting 2009-05-11 20:48:40,982 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 39, waiting 2009-05-11 20:49:22,073 [IPC Server handler 8 on 60021] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 40, waiting 2009-05-11 20:49:22,081 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 40, waiting 2009-05-11 20:49:49,418 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 41, waiting 2009-05-11 20:50:27,648 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 42, waiting 2009-05-11 20:50:55,221 [regionserver/0:0:0:0:0:0:0:0:60021.cacheFlusher] INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Too many store files for region TestTable,0452156162,1242073708841: 43, waiting TODO: Make issues for above after its easy to create because commit log can be disabled.
          Hide
          stack added a comment -

          Trying to do bulk PE import into my little 4 node cluster running 3 regionservers and 4 datanodes I'm seeing about 600 rows a second on average. Thats way too low.

          Show
          stack added a comment - Trying to do bulk PE import into my little 4 node cluster running 3 regionservers and 4 datanodes I'm seeing about 600 rows a second on average. Thats way too low.
          Hide
          stack added a comment -

          Bringing into 0.20.0. It probably won't be fixed by 0.20.0 release but should be able to make some improvement log writing.

          Show
          stack added a comment - Bringing into 0.20.0. It probably won't be fixed by 0.20.0 release but should be able to make some improvement log writing.
          Hide
          stack added a comment -

          From Raghu, write < 3 replicas and it'll go faster (2?). Can I tell hdfs NOT to write local. Ryan says that I can set replication on file after its closed, up it to 3.

          Show
          stack added a comment - From Raghu, write < 3 replicas and it'll go faster (2?). Can I tell hdfs NOT to write local. Ryan says that I can set replication on file after its closed, up it to 3.
          Hide
          stack added a comment -

          Made this critical; its critical regards our write performance.

          Show
          stack added a comment - Made this critical; its critical regards our write performance.
          Hide
          stack added a comment -

          Patch is wrong. Reverse this line: + long took = now - System.currentTimeMillis();

          Show
          stack added a comment - Patch is wrong. Reverse this line: + long took = now - System.currentTimeMillis();
          Hide
          stack added a comment -

          Looking at log sizes, they are 110MB regularly in this test. So, there is at least one block transition inline. That probably costs.

          So, here are some tests to try using HLog and HDFS only – can be done without involving regionserver – just write a harness:

          1. See if writing logs < hdfs block size improves things
          2. In my measurements, it doesn't seem to be an issue but close of a log file during roll is inline – all writes stop while the close is being done. Measure doing close out of band. Measure too opening the new log file before we need it so that its read to go rather than hold up incoming edits while new file is opened.

          I don't think the above will address the long time I'm seeing doing appends at certain times during a bulk upload. Talking with Ryan, maybe we need a pool of loggers where we scatter edits over the pool – if one is off waiting on a DataStreamer to catchup then that one write will take some time rather than have all blocked. This'll make the recovery harder in that we'll have to sort regionservers's log pool but we're going to have to rewrite that anyways.

          Show
          stack added a comment - Looking at log sizes, they are 110MB regularly in this test. So, there is at least one block transition inline. That probably costs. So, here are some tests to try using HLog and HDFS only – can be done without involving regionserver – just write a harness: 1. See if writing logs < hdfs block size improves things 2. In my measurements, it doesn't seem to be an issue but close of a log file during roll is inline – all writes stop while the close is being done. Measure doing close out of band. Measure too opening the new log file before we need it so that its read to go rather than hold up incoming edits while new file is opened. I don't think the above will address the long time I'm seeing doing appends at certain times during a bulk upload. Talking with Ryan, maybe we need a pool of loggers where we scatter edits over the pool – if one is off waiting on a DataStreamer to catchup then that one write will take some time rather than have all blocked. This'll make the recovery harder in that we'll have to sort regionservers's log pool but we're going to have to rewrite that anyways.
          Hide
          stack added a comment -

          Here is a sample from a recent upload:

          2009-05-09 20:50:05,268 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region TestTable,0265339432,1241902192276, encoded=904457967
          2009-05-09 20:50:05,281 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbasetrunk2/TestTable/904457967/info/1846359708595609270.99884447, isReference=true, sequence id=70531758, length=158510337, majorCompaction=false
          2009-05-09 20:50:05,292 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbasetrunk2/TestTable/904457967/info/516935528186138325.99884447, isReference=true, sequence id=127140420, length=157409190, majorCompaction=false
          2009-05-09 20:50:05,292 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.Store: Loaded 2 file(s) in Store info, max sequence id 127140420
          2009-05-09 20:50:05,293 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Next sequence id for region TestTable,0265339432,1241902192276 is 127140421
          2009-05-09 20:50:05,294 [regionserver/0:0:0:0:0:0:0:0:60021.worker] INFO org.apache.hadoop.hbase.regionserver.HRegion: region TestTable,0265339432,1241902192276/904457967 available
          2009-05-09 20:50:05,294 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region TestTable,0265339432,1241902192276/904457967 because: Region open check
          2009-05-09 20:50:05,918 [IPC Server handler 8 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 8 on 60021 TOOK 1386ms to commit 1
          2009-05-09 20:50:06,410 [IPC Server handler 9 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 9 on 60021 TOOK 2140ms to commit 1
          2009-05-09 20:50:06,413 [IPC Server handler 5 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 5 on 60021 TOOK 1774ms to commit 1
          2009-05-09 20:50:06,413 [IPC Server handler 6 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 6 on 60021 TOOK 1882ms to commit 1
          2009-05-09 20:50:06,413 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 1 on 60021 TOOK 1882ms to commit 1
          2009-05-09 20:50:06,414 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 7 on 60021 TOOK 1883ms to commit 1
          2009-05-09 20:50:07,786 [IPC Server handler 8 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 8 on 60021 TOOK 1371ms to commit 1
          2009-05-09 20:50:07,786 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 1 on 60021 TOOK 1372ms to commit 1
          2009-05-09 20:50:09,663 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 7 on 60021 TOOK 1129ms to commit 1
          2009-05-09 20:50:11,458 [IPC Server handler 6 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 6 on 60021 TOOK 1192ms to commit 1
          2009-05-09 20:50:11,458 [IPC Server handler 0 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 0 on 60021 TOOK 1068ms to commit 1
          2009-05-09 20:50:11,458 [IPC Server handler 9 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 9 on 60021 TOOK 1177ms to commit 1
          2009-05-09 20:50:11,458 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 7 on 60021 TOOK 1184ms to commit 1
          2009-05-09 20:50:11,458 [IPC Server handler 8 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 8 on 60021 TOOK 1177ms to commit 1
          2009-05-09 20:50:12,949 [IPC Server handler 6 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 6 on 60021 TOOK 1061ms to commit 1
          ...
          

          Thats a single put/append to SequenceFile taking over a second to complete and it goes on for a while. I had us log at 1000ms so not sure how many are actually going in at this time totally.

          Show
          stack added a comment - Here is a sample from a recent upload: 2009-05-09 20:50:05,268 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region TestTable,0265339432,1241902192276, encoded=904457967 2009-05-09 20:50:05,281 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbasetrunk2/TestTable/904457967/info/1846359708595609270.99884447, isReference= true , sequence id=70531758, length=158510337, majorCompaction= false 2009-05-09 20:50:05,292 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbasetrunk2/TestTable/904457967/info/516935528186138325.99884447, isReference= true , sequence id=127140420, length=157409190, majorCompaction= false 2009-05-09 20:50:05,292 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.Store: Loaded 2 file(s) in Store info, max sequence id 127140420 2009-05-09 20:50:05,293 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Next sequence id for region TestTable,0265339432,1241902192276 is 127140421 2009-05-09 20:50:05,294 [regionserver/0:0:0:0:0:0:0:0:60021.worker] INFO org.apache.hadoop.hbase.regionserver.HRegion: region TestTable,0265339432,1241902192276/904457967 available 2009-05-09 20:50:05,294 [regionserver/0:0:0:0:0:0:0:0:60021.worker] DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region TestTable,0265339432,1241902192276/904457967 because: Region open check 2009-05-09 20:50:05,918 [IPC Server handler 8 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 8 on 60021 TOOK 1386ms to commit 1 2009-05-09 20:50:06,410 [IPC Server handler 9 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 9 on 60021 TOOK 2140ms to commit 1 2009-05-09 20:50:06,413 [IPC Server handler 5 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 5 on 60021 TOOK 1774ms to commit 1 2009-05-09 20:50:06,413 [IPC Server handler 6 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 6 on 60021 TOOK 1882ms to commit 1 2009-05-09 20:50:06,413 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 1 on 60021 TOOK 1882ms to commit 1 2009-05-09 20:50:06,414 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 7 on 60021 TOOK 1883ms to commit 1 2009-05-09 20:50:07,786 [IPC Server handler 8 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 8 on 60021 TOOK 1371ms to commit 1 2009-05-09 20:50:07,786 [IPC Server handler 1 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 1 on 60021 TOOK 1372ms to commit 1 2009-05-09 20:50:09,663 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 7 on 60021 TOOK 1129ms to commit 1 2009-05-09 20:50:11,458 [IPC Server handler 6 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 6 on 60021 TOOK 1192ms to commit 1 2009-05-09 20:50:11,458 [IPC Server handler 0 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 0 on 60021 TOOK 1068ms to commit 1 2009-05-09 20:50:11,458 [IPC Server handler 9 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 9 on 60021 TOOK 1177ms to commit 1 2009-05-09 20:50:11,458 [IPC Server handler 7 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 7 on 60021 TOOK 1184ms to commit 1 2009-05-09 20:50:11,458 [IPC Server handler 8 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 8 on 60021 TOOK 1177ms to commit 1 2009-05-09 20:50:12,949 [IPC Server handler 6 on 60021] INFO org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 6 on 60021 TOOK 1061ms to commit 1 ... Thats a single put/append to SequenceFile taking over a second to complete and it goes on for a while. I had us log at 1000ms so not sure how many are actually going in at this time totally.
          Hide
          stack added a comment -

          I'd like to highlight how much of a bottleneck this is but I don't want to add a tax on every HLog.append to do it. If someone wants to see how this costs, they can apply attached patch.

          Show
          stack added a comment - I'd like to highlight how much of a bottleneck this is but I don't want to add a tax on every HLog.append to do it. If someone wants to see how this costs, they can apply attached patch.
          Hide
          stack added a comment -

          Here is another permutation on the "HDFS is overwhelmed":

          "regionserver/0:0:0:0:0:0:0:0:60021.logRoller" daemon prio=10 tid=0x00007f3b90248400 nid=0x5dc8 waiting for monitor entry [0x0000000041fcd000..0x0000000041fcda80]
             java.lang.Thread.State: BLOCKED (on object monitor)
              at java.lang.Object.wait(Native Method)
              at java.lang.Object.wait(Object.java:485)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3105)
              - locked <0x00007f3bc7716460> (a java.util.LinkedList)
              - locked <0x00007f3bc7716888> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3196)
              - locked <0x00007f3bc7716888> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3145)
              at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
              at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
              at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966)
              - locked <0x00007f3bc7b118b0> (a org.apache.hadoop.io.SequenceFile$Writer)
              at org.apache.hadoop.hbase.regionserver.HLog.cleanupCurrentWriter(HLog.java:369)
              at org.apache.hadoop.hbase.regionserver.HLog.rollWriter(HLog.java:259)
              - locked <0x00007f3b9c8be490> (a java.lang.Object)
              at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:65)
          

          We're waiting for all packets to cleared out of the DFSClient queue before closing. Can take a while.

          Show
          stack added a comment - Here is another permutation on the "HDFS is overwhelmed": "regionserver/0:0:0:0:0:0:0:0:60021.logRoller" daemon prio=10 tid=0x00007f3b90248400 nid=0x5dc8 waiting for monitor entry [0x0000000041fcd000..0x0000000041fcda80] java.lang. Thread .State: BLOCKED (on object monitor) at java.lang. Object .wait(Native Method) at java.lang. Object .wait( Object .java:485) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3105) - locked <0x00007f3bc7716460> (a java.util.LinkedList) - locked <0x00007f3bc7716888> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3196) - locked <0x00007f3bc7716888> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3145) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966) - locked <0x00007f3bc7b118b0> (a org.apache.hadoop.io.SequenceFile$Writer) at org.apache.hadoop.hbase.regionserver.HLog.cleanupCurrentWriter(HLog.java:369) at org.apache.hadoop.hbase.regionserver.HLog.rollWriter(HLog.java:259) - locked <0x00007f3b9c8be490> (a java.lang. Object ) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:65) We're waiting for all packets to cleared out of the DFSClient queue before closing. Can take a while.
          Hide
          stack added a comment -

          This is our main bottleneck inputting to hdfs. To see for yourself, add logging like below:

          Index: src/java/org/apache/hadoop/hbase/regionserver/HLog.java
          ===================================================================
          --- src/java/org/apache/hadoop/hbase/regionserver/HLog.java     (revision 773167)
          +++ src/java/org/apache/hadoop/hbase/regionserver/HLog.java     (working copy)
          @@ -556,11 +556,19 @@
             
             private void doWrite(HLogKey logKey, HLogEdit logEdit, boolean sync)
             throws IOException {
          +    long now = System.currentTimeMillis();
               try {
                 this.writer.append(logKey, logEdit);
                 if (sync || this.unflushedEntries.incrementAndGet() >= flushlogentries) {
                   sync();
                 }
          +      /* Enable this if you want to see how sometimes we're filling the hdfs
          +         queue of packets so we stall.  See HBASE-1394.
          +      long took = System.currentTimeMillis() - now;
          +      if (LOG.isDebugEnabled() && took > 1000) {
          +        LOG.debug(Thread.currentThread().getName() + " took " + took + "ms to append.");
          +      }
          +      */
               } catch (IOException e) {
                 LOG.fatal("Could not append. Requesting close of log", e);
                 requestLogRoll();
          

          We should work on the writing of files direct to HDFS for hbase to pick up afterward.

          Show
          stack added a comment - This is our main bottleneck inputting to hdfs. To see for yourself, add logging like below: Index: src/java/org/apache/hadoop/hbase/regionserver/HLog.java =================================================================== --- src/java/org/apache/hadoop/hbase/regionserver/HLog.java (revision 773167) +++ src/java/org/apache/hadoop/hbase/regionserver/HLog.java (working copy) @@ -556,11 +556,19 @@ private void doWrite(HLogKey logKey, HLogEdit logEdit, boolean sync) throws IOException { + long now = System .currentTimeMillis(); try { this .writer.append(logKey, logEdit); if (sync || this .unflushedEntries.incrementAndGet() >= flushlogentries) { sync(); } + /* Enable this if you want to see how sometimes we're filling the hdfs + queue of packets so we stall. See HBASE-1394. + long took = System .currentTimeMillis() - now; + if (LOG.isDebugEnabled() && took > 1000) { + LOG.debug( Thread .currentThread().getName() + " took " + took + "ms to append." ); + } + */ } catch (IOException e) { LOG.fatal( "Could not append. Requesting close of log" , e); requestLogRoll(); We should work on the writing of files direct to HDFS for hbase to pick up afterward.
          Hide
          stack added a comment -

          The line we're blocked on is below:

          20:21 <St^Ack> // If queue is full, then wait till we can create enough space
          20:21 <St^Ack> while (!closed && dataQueue.size() + ackQueue.size() > maxPackets) {
          20:21 <St^Ack> try

          { 20:21 <St^Ack> dataQueue.wait(); 20:21 <St^Ack> }

          catch (InterruptedException e)

          { 20:21 <St^Ack> }

          20:21 <St^Ack> }

          Here is note on why we're blocked – looks like outstanding packets to send is at a maximum:

          20:26 <St^Ack> * The client application writes data that is cached internally by
          20:26 <St^Ack> * this stream. Data is broken up into packets, each packet is
          20:26 <St^Ack> * typically 64K in size. A packet comprises of chunks. Each chunk
          20:26 <St^Ack> * is typically 512 bytes and has an associated checksum with it.
          20:26 <St^Ack> *
          20:26 <St^Ack> * When a client application fills up the currentPacket, it is
          20:26 <St^Ack> * enqueued into dataQueue. The DataStreamer thread picks up
          20:26 <St^Ack> * packets from the dataQueue, sends it to the first datanode in
          20:26 <St^Ack> * the pipeline and moves it from the dataQueue to the ackQueue.
          20:26 <St^Ack> * The ResponseProcessor receives acks from the datanodes. When an
          20:26 <St^Ack> * successful ack for a packet is received from all datanodes, the
          20:26 <St^Ack> * ResponseProcessor removes the corresponding packet from the
          20:26 <St^Ack> * ackQueue.
          20:26 <St^Ack> *
          20:26 <St^Ack> * In case of error, all outstanding packets and moved from
          20:26 <St^Ack> * ackQueue. A new pipeline is setup by eliminating the bad
          20:26 <St^Ack> * datanode from the original pipeline. The DataStreamer now
          20:26 <St^Ack> * starts sending packets from the dataQueue.

          Here is maximum outstanding packets:

          20:27 <St^Ack> private int maxPackets = 80; // each packet 64K, total 5MB

          I looked at the ResponseProcessor and DataStreamer code – no obvious big stalls/sleeps.

          Show
          stack added a comment - The line we're blocked on is below: 20:21 <St^Ack> // If queue is full, then wait till we can create enough space 20:21 <St^Ack> while (!closed && dataQueue.size() + ackQueue.size() > maxPackets) { 20:21 <St^Ack> try { 20:21 <St^Ack> dataQueue.wait(); 20:21 <St^Ack> } catch (InterruptedException e) { 20:21 <St^Ack> } 20:21 <St^Ack> } Here is note on why we're blocked – looks like outstanding packets to send is at a maximum: 20:26 <St^Ack> * The client application writes data that is cached internally by 20:26 <St^Ack> * this stream. Data is broken up into packets, each packet is 20:26 <St^Ack> * typically 64K in size. A packet comprises of chunks. Each chunk 20:26 <St^Ack> * is typically 512 bytes and has an associated checksum with it. 20:26 <St^Ack> * 20:26 <St^Ack> * When a client application fills up the currentPacket, it is 20:26 <St^Ack> * enqueued into dataQueue. The DataStreamer thread picks up 20:26 <St^Ack> * packets from the dataQueue, sends it to the first datanode in 20:26 <St^Ack> * the pipeline and moves it from the dataQueue to the ackQueue. 20:26 <St^Ack> * The ResponseProcessor receives acks from the datanodes. When an 20:26 <St^Ack> * successful ack for a packet is received from all datanodes, the 20:26 <St^Ack> * ResponseProcessor removes the corresponding packet from the 20:26 <St^Ack> * ackQueue. 20:26 <St^Ack> * 20:26 <St^Ack> * In case of error, all outstanding packets and moved from 20:26 <St^Ack> * ackQueue. A new pipeline is setup by eliminating the bad 20:26 <St^Ack> * datanode from the original pipeline. The DataStreamer now 20:26 <St^Ack> * starts sending packets from the dataQueue. Here is maximum outstanding packets: 20:27 <St^Ack> private int maxPackets = 80; // each packet 64K, total 5MB I looked at the ResponseProcessor and DataStreamer code – no obvious big stalls/sleeps.

            People

            • Assignee:
              Unassigned
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development