HBase
  1. HBase
  2. HBASE-428

Under continuous upload of rows, WrongRegionExceptions are thrown that reach the client even after retries

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.1.0, 0.2.0
    • Fix Version/s: 0.2.0
    • Component/s: regionserver
    • Labels:
      None
    • Environment:

      Linux 2.6.9-67.0.1.ELsmp #1 SMP Wed Dec 19 16:01:12 EST 2007 i686 athlon i386 GNU/Linux

      Description

      I have installed 0.16.0 rc 1 which I believe contains a fix for similar issue HBASE-138, but I still see the same problem.

      • I am using a single node.
      • The client application runs in a single thread, loading data into a single table.
      • I get good throughput of about 200 rows/sec to start with, with occasional significant drops due to NotServingRegionException's that are recoverable on client retry (internal to hbase).
      • After 54 minutes, and about 500,000 rows I start to see WrongRegionException's in the client application, i.e. real failures. (Note that this compares to 0.15.3 which would being to throw NotServingRegionExceptions after a few tens of thousands of rows).

      My data consists of a single table with 5 column families. The data written is as follows:>>
      key: a URL
      family 1: a small string, often emty, 2 longs, 1 int
      family 2: a byte averaging averaging between 1k and 10k, a small string
      family 3: several columns with different names per row, values of small strings
      family 4: most rows have zero columns, some rows have 1 or more columns with a UL value
      The URLs are typically "long-ish" URL as seen when crawling a site, not short home page URLs

      I am assuming the data is stored in files of the form <hbaseroot>//<tablename>/<9digitnum>/data/mapfiles/<19digitnum>/data. I have attached a csv file showing the distribution of size of these files. Average size is 19Mb, but the sizes are not evenly distributed at all

      Here are two sample exceptions thrown, copied from the region server log:

      2008-02-08 02:08:22,495 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 60020, call batchUpdate(pagefetch,http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924,1202401088077, 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@feb215) from 66.135.42.137:38484: error: org.apache.hadoop.hbase.WrongRegionException: Requested row out of range for HRegion pagefetch,http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924,1202401088077, startKey='http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924', getEndKey()='http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924', row='http://go2purdue.com/Redeemer_University.cfm?pt=2&sp=2&vid=1199243289_3X02X1468757255&rpt=2&kt=4&kp=1 wap2 20080102081237'
      org.apache.hadoop.hbase.WrongRegionException: Requested row out of range for HRegion pagefetch,http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924,1202401088077, startKey='http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924', getEndKey()='http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924', row='http://go2purdue.com/Redeemer_University.cfm?pt=2&sp=2&vid=1199243289_3X02X1468757255&rpt=2&kt=4&kp=1 wap2 20080102081237'
      at org.apache.hadoop.hbase.HRegion.checkRow(HRegion.java:1486)
      at org.apache.hadoop.hbase.HRegion.obtainRowLock(HRegion.java:1531)
      at org.apache.hadoop.hbase.HRegion.batchUpdate(HRegion.java:1226)
      at org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1433)
      at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:585)
      at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
      2008-02-08 02:08:22,696 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 60020, call batchUpdate(pagefetch,http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924,1202401088077, 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@15d9be1) from 66.135.42.137:38484: error: org.apache.hadoop.hbase.WrongRegionException: Requested row out of range for HRegion pagefetch,http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924,1202401088077, startKey='http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924', getEndKey()='http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924', row='http://go2umass.com/Travel.cfm?pt=2&sp=2&vid=1199230721_3X04X1485302803&rpt=2&kt=5&kp=8 wap2 20080102081239'
      org.apache.hadoop.hbase.WrongRegionException: Requested row out of range for HRegion pagefetch,http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924,1202401088077, startKey='http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924', getEndKey()='http://galsn1.mobilook.mobiwap.com/bm/listproducts;jsessionid=D2ED1EB898163CDB27135DC2CF6958B3.197B?rsi=78011 wap2 20080102052924', row='http://go2umass.com/Travel.cfm?pt=2&sp=2&vid=1199230721_3X04X1485302803&rpt=2&kt=5&kp=8 wap2 20080102081239'
      at org.apache.hadoop.hbase.HRegion.checkRow(HRegion.java:1486)
      at org.apache.hadoop.hbase.HRegion.obtainRowLock(HRegion.java:1531)
      at org.apache.hadoop.hbase.HRegion.batchUpdate(HRegion.java:1226)
      at org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1433)
      at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:585)
      at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)

      1. 428-v5.patch
        10 kB
        Marc Harris
      2. 428-v4.patch
        11 kB
        stack
      3. 428-v3.patch
        7 kB
        stack
      4. 428-2.patch
        3 kB
        stack
      5. 428.patch
        3 kB
        stack
      6. selectfrommeta.txt
        121 kB
        Marc Harris
      7. lsr
        145 kB
        Marc Harris
      8. filesbysize.csv
        23 kB
        Marc Harris

        Activity

        Hide
        Marc Harris added a comment -

        428-v4.patch was made against slightly the wrong version. v4 included the removal of some commented out code in HRegionServer.java that caused the patch to fail (the code was already removed in the official 0.16.0-rc1 release

        Show
        Marc Harris added a comment - 428-v4.patch was made against slightly the wrong version. v4 included the removal of some commented out code in HRegionServer.java that caused the patch to fail (the code was already removed in the official 0.16.0-rc1 release
        Hide
        stack added a comment -

        Marc sent me logs of an uploading to review. No more instances of WRE. Splits, compactions, and flushes run fine. Resolving.

        Show
        stack added a comment - Marc sent me logs of an uploading to review. No more instances of WRE. Splits, compactions, and flushes run fine. Resolving.
        Hide
        Marc Harris added a comment -

        Since applying 428-v4.patch and increasing my heap to 2G, I have not had a recurrence of this issue.

        Show
        Marc Harris added a comment - Since applying 428-v4.patch and increasing my heap to 2G, I have not had a recurrence of this issue.
        Hide
        Bryan Duxbury added a comment -

        This needs to get done before 0.2 can roll.

        Show
        Bryan Duxbury added a comment - This needs to get done before 0.2 can roll.
        Hide
        stack added a comment -

        Applied v4 to branch and trunk. Will wait on feedback from Marc as to whether this addresses his issue.

        Show
        stack added a comment - Applied v4 to branch and trunk. Will wait on feedback from Marc as to whether this addresses his issue.
        Hide
        Bryan Duxbury added a comment -

        +1 the patch passes unit tests for me locally.

        Show
        Bryan Duxbury added a comment - +1 the patch passes unit tests for me locally.
        Hide
        stack added a comment -

        Added test for splittability. We were going ahead calculating midkey though store wasn't splittalble anways 'cos it still had references.

        Show
        stack added a comment - Added test for splittability. We were going ahead calculating midkey though store wasn't splittalble anways 'cos it still had references.
        Hide
        stack added a comment -

        v3 has more cleanup. Fixes rare case where if a store file had a single entry only, we were overwriting the calculated midKey for the region (could be whats wrong w/ our midkey calc. but wouldn't want to bet on it – seems like too rare an occurance).

        M src/java/org/apache/hadoop/hbase/HStoreFile.java
        (finalKey) Removed checkKey if trying to get finalKey on top-half
        of a HalfMapFile. Was failing because it would compare the
        HalfMapFile midkey to the empty passed key into which we're to
        set the mapfile last key.
        M src/java/org/apache/hadoop/hbase/HStore.java
        If a return is being done inside the 'if' of an 'if/else', then
        the else is not needed. Fixed two of these. Removed commented
        out logging. Rewrote a cumbersome if/else as a ?:.
        Removed unused rowKey assignment. Removed unnecessary casts.
        Renamed localvariable midkey as mk because it was too close
        to the passed in arg midKey. Do NOT copy
        midkey/mk before test that it was equal to start/end keys. A
        store could have a midkey/mk that equaled region start/end keys
        and we were nonetheless overwriting midKey, the passed in
        arg.
        M src/java/org/apache/hadoop/hbase/HRegionServer.java
        Removed commented out code. Output full regioninfo when splitting
        so can see start and end keys, etc.
        M src/java/org/apache/hadoop/hbase/HRegion.java
        If end or start key matches mid key, don't split.

        Show
        stack added a comment - v3 has more cleanup. Fixes rare case where if a store file had a single entry only, we were overwriting the calculated midKey for the region (could be whats wrong w/ our midkey calc. but wouldn't want to bet on it – seems like too rare an occurance). M src/java/org/apache/hadoop/hbase/HStoreFile.java (finalKey) Removed checkKey if trying to get finalKey on top-half of a HalfMapFile. Was failing because it would compare the HalfMapFile midkey to the empty passed key into which we're to set the mapfile last key. M src/java/org/apache/hadoop/hbase/HStore.java If a return is being done inside the 'if' of an 'if/else', then the else is not needed. Fixed two of these. Removed commented out logging. Rewrote a cumbersome if/else as a ?:. Removed unused rowKey assignment. Removed unnecessary casts. Renamed localvariable midkey as mk because it was too close to the passed in arg midKey. Do NOT copy midkey/mk before test that it was equal to start/end keys. A store could have a midkey/mk that equaled region start/end keys and we were nonetheless overwriting midKey, the passed in arg. M src/java/org/apache/hadoop/hbase/HRegionServer.java Removed commented out code. Output full regioninfo when splitting so can see start and end keys, etc. M src/java/org/apache/hadoop/hbase/HRegion.java If end or start key matches mid key, don't split.
        Hide
        stack added a comment -

        Thanks for sending the logs Marc. I took a look. Things are messy – lots of concurrent compactions going on – but seemed to be holding up.

        The patch doesn't actually prevent the core problem. The patch added extra logging and it added not splitting if it would result in a region that had same start and end key. Here's what I see:

        2008-02-16 02:03:05,324 DEBUG org.apache.hadoop.hbase.HRegion: Split details for pagefetch,http://www.marketwatch.com/hdml wap2 20071222205256,1203126936284: startKey http://www.marketwatch.com/hdml wap2 20071222205256, midkey: http://www.marketwatch.com/hdml wap2 20071222205256, endKey http://www.myarmoury.com/mobile wap2 20071222205348
        2008-02-16 02:03:05,324 DEBUG org.apache.hadoop.hbase.HRegion: Startkey and midkey are same, not splitting
        

        In other words, no more WrongRegionExceptions because we notice ahead of time that the split will produce a region w/ same start and end key and we skip. Watching the logs, the region continues to grow in size and further attempts at splitting also pass because it would result in region w/ same start and end key. There is something up in our midkey calculation.

        Show
        stack added a comment - Thanks for sending the logs Marc. I took a look. Things are messy – lots of concurrent compactions going on – but seemed to be holding up. The patch doesn't actually prevent the core problem. The patch added extra logging and it added not splitting if it would result in a region that had same start and end key. Here's what I see: 2008-02-16 02:03:05,324 DEBUG org.apache.hadoop.hbase.HRegion: Split details for pagefetch,http: //www.marketwatch.com/hdml wap2 20071222205256,1203126936284: startKey http://www.marketwatch.com/hdml wap2 20071222205256, midkey: http://www.marketwatch.com/hdml wap2 20071222205256, endKey http://www.myarmoury.com/mobile wap2 20071222205348 2008-02-16 02:03:05,324 DEBUG org.apache.hadoop.hbase.HRegion: Startkey and midkey are same, not splitting In other words, no more WrongRegionExceptions because we notice ahead of time that the split will produce a region w/ same start and end key and we skip. Watching the logs, the region continues to grow in size and further attempts at splitting also pass because it would result in region w/ same start and end key. There is something up in our midkey calculation.
        Hide
        stack added a comment -

        Cleaned up version of patch I sent to Marc

        Show
        stack added a comment - Cleaned up version of patch I sent to Marc
        Hide
        stack added a comment -

        Marc, see if #3 in the FAQ helps: http://wiki.apache.org/hadoop/Hbase/FAQ#3. Let us know how it goes.

        Show
        stack added a comment - Marc, see if #3 in the FAQ helps: http://wiki.apache.org/hadoop/Hbase/FAQ#3 . Let us know how it goes.
        Hide
        Marc Harris added a comment -

        The patch seems to have got past this error, but now a new one (out of heap) occurs later. Possibly this bug should be considered fixed and a new one opened.

        Current results:
        I no longer get WrongRegionException's
        After approx 700,000 rows uploaded, the region server throws an OutOfMemoryError, followed by many "Server not running" exceptions (exception log below).
        I am able to restart the hbase region and master servers (and the client app), and store another 800,000 rows before the same OutOfMemoryError.
        After that, I can restart the hbase region and master servers (and the client app), but continuing the upload causes more OutOfMemoryError exceptions quickly.

        Full logs will be sent to stack by e-mail.

        008-02-16 02:24:38,884 INFO org.apache.hadoop.hbase.HLog: new log writer created at hdfs://server14:54310/hbase/log_66.135.42.137_1203123804816_60020/hlog.dat.322
        2008-02-16 02:25:45,751 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region pagefetch,http://www.marketwatch.com/hdml wap2 20071222205256,1203126936284. Size 62.6m
        2008-02-16 02:25:57,378 FATAL org.apache.hadoop.hbase.HRegionServer: Set stop flag in regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher
        java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:1518)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2125)
        at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:141)
        at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
        at org.apache.hadoop.io.MapFile$Writer.append(MapFile.java:188)
        at org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Writer.append(HStoreFile.java:721)
        at org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:1113)
        at org.apache.hadoop.hbase.HStore.flushCache(HStore.java:1081)
        at org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:954)
        at org.apache.hadoop.hbase.HRegion.flushcache(HRegion.java:852)
        at org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:417)
        2008-02-16 02:25:57,405 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 60020, call batchUpdate(pagefetch,http://mobility.mobi/showthread.php?goto=newpost&t=2677 wap2 20071223005632,1203126357490, 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@9bad5a) from 66.135.42.137:56275: error: java.io.IOException: Server not running
        java.io.IOException: Server not running
        at org.apache.hadoop.hbase.HRegionServer.checkOpen(HRegionServer.java:1626)
        at org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1429)
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
        2008-02-16 02:25:57,406 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 60020, call getClosestRowBefore(.META.,,1, pagefetch,http://pda.physorg.com/lofi-news-seafloor-fault-tsunami_114370203.html wap2 20080102111657,999999999999999, 9223372036854775807) from 66.135.42.137:56275: error: java.io.IOException: Server not running

        Show
        Marc Harris added a comment - The patch seems to have got past this error, but now a new one (out of heap) occurs later. Possibly this bug should be considered fixed and a new one opened. Current results: I no longer get WrongRegionException's After approx 700,000 rows uploaded, the region server throws an OutOfMemoryError, followed by many "Server not running" exceptions (exception log below). I am able to restart the hbase region and master servers (and the client app), and store another 800,000 rows before the same OutOfMemoryError. After that, I can restart the hbase region and master servers (and the client app), but continuing the upload causes more OutOfMemoryError exceptions quickly. Full logs will be sent to stack by e-mail. 008-02-16 02:24:38,884 INFO org.apache.hadoop.hbase.HLog: new log writer created at hdfs://server14:54310/hbase/log_66.135.42.137_1203123804816_60020/hlog.dat.322 2008-02-16 02:25:45,751 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region pagefetch, http://www.marketwatch.com/hdml wap2 20071222205256,1203126936284. Size 62.6m 2008-02-16 02:25:57,378 FATAL org.apache.hadoop.hbase.HRegionServer: Set stop flag in regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39) at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:1518) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2125) at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:141) at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100) at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977) at org.apache.hadoop.io.MapFile$Writer.append(MapFile.java:188) at org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Writer.append(HStoreFile.java:721) at org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:1113) at org.apache.hadoop.hbase.HStore.flushCache(HStore.java:1081) at org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:954) at org.apache.hadoop.hbase.HRegion.flushcache(HRegion.java:852) at org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:417) 2008-02-16 02:25:57,405 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 60020, call batchUpdate(pagefetch, http://mobility.mobi/showthread.php?goto=newpost&t=2677 wap2 20071223005632,1203126357490, 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@9bad5a) from 66.135.42.137:56275: error: java.io.IOException: Server not running java.io.IOException: Server not running at org.apache.hadoop.hbase.HRegionServer.checkOpen(HRegionServer.java:1626) at org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1429) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910) 2008-02-16 02:25:57,406 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 60020, call getClosestRowBefore(.META.,,1, pagefetch, http://pda.physorg.com/lofi-news-seafloor-fault-tsunami_114370203.html wap2 20080102111657,999999999999999, 9223372036854775807) from 66.135.42.137:56275: error: java.io.IOException: Server not running
        Hide
        stack added a comment -

        Patch to add more logging around split. Adds consistency check to make sure start and end keys are not same on a split. Logs split details.

        Show
        stack added a comment - Patch to add more logging around split. Adds consistency check to make sure start and end keys are not same on a split. Logs split details.
        Hide
        stack added a comment -

        Marking blocker. Assigning myself. Fix needs to be backported.

        Show
        stack added a comment - Marking blocker. Assigning myself. Fix needs to be backported.
        Hide
        stack added a comment -

        Thanks for posting the .META. select Marc.

        I've noticed a few things. Here's a region whose start and end key is same:

        2008-02-10 16:18:15,134 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_OPEN : pagefetch,http://fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026,1202660291003 from 66.135.42.137:60020
        2008-02-10 16:18:15,134 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop: PendingOpenOperation from 66.135.42.137:60020
        2008-02-10 16:18:15,134 INFO org.apache.hadoop.hbase.HMaster: 66.135.42.137:60020 serving pagefetch,http://fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026,1202660291003
        2008-02-10 16:18:15,134 INFO org.apache.hadoop.hbase.HMaster: regionname: pagefetch,http://flirtbox.mobi/new.php?type=html&forum_id=95&topic_index=0 wap2 20071222232620,1202660291003, startKey: <http://flirtbox.mobi/new.php?type=html&forum_id=95&topic_index=0 wap2 20071222232620>, endKey: <http://fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026>, encodedName: 1636112728, tableDesc: {name: pagefetch, families: {changedata:={name: changedata, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, data:={name: data, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, headers:={name: headers, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, redirects:={name: redirects, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} open on 66.135.42.137:60020
        

        Here is the region that was split that produced the above:

        2008-02-10 16:17:54,112 INFO org.apache.hadoop.hbase.HMaster: regionname: pagefetch,http://flirtbox.mobi/new.php?type=html&forum_id=95&topic_index=0 wap2 20071222232620,1202660269165, startKey: <http://flirtbox.mobi/new.php?type=html&forum_id=95&topic_index=0 wap2 20071222232620>, endKey: <http://go2uwash.com/ wap2 20071222205139>, encodedName: 7645492, tableDesc: {name: pagefetch, families: {changedata:={name: changedata, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, data:={name: data, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, headers:={name: headers, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, redirects:={name: redirects, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} open on 66.135.42.137:60020
        

        Looks like it has go2uwash as end key. Why doesn't fun.twilightwap.com region have go2wash as its end key? The row we are trying to insert is 'http://go2purdue.com/Indiana_State_University_Terre_Haute.cfm?pt=2&sp=2&vid=1199235588_3X02X1468516268&rpt=2&kt=5&kp=8 wap2 20080102090745' which would go into this region if go2wash was the end key.

        For good measure, here is the regionserver split report:

        2008-02-10 16:18:12,053 INFO org.apache.hadoop.hbase.HRegionServer: region split, META updated, and report to master all successful. Old region=pagefetch,http://flirtbox.mobi/new.php?type=html&forum_id=95&topic_index=0 wap2 20071222232620,1202660269165, new regions: pagefetch,http://flirtbox.mobi/new.php?type=html&forum_id=95&topic_index=0 wap2 20071222232620,1202660291003, pagefetch,http://fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026,1202660291003. Split took 1sec
        
        Show
        stack added a comment - Thanks for posting the .META. select Marc. I've noticed a few things. Here's a region whose start and end key is same: 2008-02-10 16:18:15,134 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_OPEN : pagefetch,http: //fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026,1202660291003 from 66.135.42.137:60020 2008-02-10 16:18:15,134 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop: PendingOpenOperation from 66.135.42.137:60020 2008-02-10 16:18:15,134 INFO org.apache.hadoop.hbase.HMaster: 66.135.42.137:60020 serving pagefetch,http: //fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026,1202660291003 2008-02-10 16:18:15,134 INFO org.apache.hadoop.hbase.HMaster: regionname: pagefetch,http: //flirtbox.mobi/ new .php?type=html&forum_id=95&topic_index=0 wap2 20071222232620,1202660291003, startKey: <http://flirtbox.mobi/ new .php?type=html&forum_id=95&topic_index=0 wap2 20071222232620>, endKey: <http://fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026>, encodedName: 1636112728, tableDesc: {name: pagefetch, families: {changedata:={name: changedata, max versions: 1, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}, data:={name: data, max versions: 1, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}, headers:={name: headers, max versions: 1, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}, info:={name: info, max versions: 1, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}, redirects:={name: redirects, max versions: 1, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}}} open on 66.135.42.137:60020 Here is the region that was split that produced the above: 2008-02-10 16:17:54,112 INFO org.apache.hadoop.hbase.HMaster: regionname: pagefetch,http: //flirtbox.mobi/ new .php?type=html&forum_id=95&topic_index=0 wap2 20071222232620,1202660269165, startKey: <http://flirtbox.mobi/ new .php?type=html&forum_id=95&topic_index=0 wap2 20071222232620>, endKey: <http://go2uwash.com/ wap2 20071222205139>, encodedName: 7645492, tableDesc: {name: pagefetch, families: {changedata:={name: changedata, max versions: 1, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}, data:={name: data, max versions: 1, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}, headers:={name: headers, max versions: 1, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}, info:={name: info, max versions: 1, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}, redirects:={name: redirects, max versions: 1, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}}} open on 66.135.42.137:60020 Looks like it has go2uwash as end key. Why doesn't fun.twilightwap.com region have go2wash as its end key? The row we are trying to insert is 'http://go2purdue.com/Indiana_State_University_Terre_Haute.cfm?pt=2&sp=2&vid=1199235588_3X02X1468516268&rpt=2&kt=5&kp=8 wap2 20080102090745' which would go into this region if go2wash was the end key. For good measure, here is the regionserver split report: 2008-02-10 16:18:12,053 INFO org.apache.hadoop.hbase.HRegionServer: region split, META updated, and report to master all successful. Old region=pagefetch,http: //flirtbox.mobi/ new .php?type=html&forum_id=95&topic_index=0 wap2 20071222232620,1202660269165, new regions: pagefetch,http://flirtbox.mobi/ new .php?type=html&forum_id=95&topic_index=0 wap2 20071222232620,1202660291003, pagefetch,http://fun.twilightwap.com/rate.asp?joke_id=183&rating=0 wap2 20080102055026,1202660291003. Split took 1sec
        Hide
        Marc Harris added a comment -

        The results for running select * from .META. in an HBASE shell. I notice that there some non-printable characters. I'm not sure if those are a results of the terminal I was using, or something else. I couldn't figure out a way to redirect the output of the select command directly to a file.

        Show
        Marc Harris added a comment - The results for running select * from .META. in an HBASE shell. I notice that there some non-printable characters. I'm not sure if those are a results of the terminal I was using, or something else. I couldn't figure out a way to redirect the output of the select command directly to a file.
        Hide
        stack added a comment -

        Thanks Marc. That listing looks pretty good. Says to me that hbase is not being overwhelmed, its just the WRE thats crippled the upload. If we can figure that...

        Show
        stack added a comment - Thanks Marc. That listing looks pretty good. Says to me that hbase is not being overwhelmed, its just the WRE thats crippled the upload. If we can figure that...
        Hide
        Marc Harris added a comment -

        executed
        bin/hadoop -fs -lsr /
        as suggested (hbase is the only thing in the hadoop instance)

        Show
        Marc Harris added a comment - executed bin/hadoop -fs -lsr / as suggested (hbase is the only thing in the hadoop instance)
        Hide
        stack added a comment -

        Marc:

        On sizes, yeah, they'll vary. In future, just do a ./bin/hadoop fs -lsr on your hbase.rootdir if you want to convey the varying sizes. The lsr will also show better how the mapfiles are grouped (and we can see if compaction and splitting is keeping up w/ the upload rate).

        Show
        stack added a comment - Marc: On sizes, yeah, they'll vary. In future, just do a ./bin/hadoop fs -lsr on your hbase.rootdir if you want to convey the varying sizes. The lsr will also show better how the mapfiles are grouped (and we can see if compaction and splitting is keeping up w/ the upload rate).
        Hide
        Marc Harris added a comment -

        Shows distribution of data file size

        Show
        Marc Harris added a comment - Shows distribution of data file size

          People

          • Assignee:
            stack
            Reporter:
            Marc Harris
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development