Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-17069

RegionServer writes invalid META entries for split daughters in some circumstances

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.3.0, 1.4.0, 1.2.4, 2.0.0
    • Fix Version/s: 1.4.0, 1.3.1, 1.2.5, 2.0.0
    • Component/s: wal
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I have been seeing frequent ITBLL failures testing various versions of 1.2.x.

      Over the lifetime of 1.2.x the following issues have been fixed:

      • HBASE-15315 (Remove always set super user call as high priority)
      • HBASE-16093 (Fix splits failed before creating daughter regions leave meta inconsistent)

      And this one is pending:

      • HBASE-17044 (Fix merge failed before creating merged region leaves meta inconsistent)

      I can apply all of the above to branch-1.2 and still see this failure:

      The life of stillborn region d55ef81c2f8299abbddfce0445067830

      Master sees SPLITTING_NEW

      2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, ts=1478579001186, server=node-3.cluster,16020,1478578389506}
      

      The RegionServer creates it

      2016-11-08 04:23:26,035 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, currentSize=14996112, freeSize=12823716208, maxSize=12838712320, heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
      2016-11-08 04:23:26,038 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for big: blockCache=LruBlockCache{blockCount=34, currentSize=14996112, freeSize=12823716208, maxSize=12838712320, heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
      2016-11-08 04:23:26,442 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, currentSize=17187656, freeSize=12821524664, maxSize=12838712320, heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
      2016-11-08 04:23:26,713 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, currentSize=19178440, freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
      2016-11-08 04:23:26,715 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, currentSize=19178440, freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
      2016-11-08 04:23:26,717 INFO  [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, currentSize=19178440, freeSize=12819533880, maxSize=12838712320, heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
      

      The RegionServer onlines it

      2016-11-08 04:23:27,015 INFO  [node-3.cluster,16020,1478578389506-daughterOpener=d55ef81c2f8299abbddfce0445067830] regionserver.HRegion: Onlined d55ef81c2f8299abbddfce0445067830; next sequenceid=19184
      2016-11-08 04:23:27,029 INFO  [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] regionserver.HRegionServer: Post open deploy tasks for IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830.
      2016-11-08 04:23:27,047 INFO  [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] hbase.MetaTableAccessor: Updated row IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. with server=node-3.cluster,16020,1478578389506
      

      The Master transitions state from SPLITTING_NEW to OPEN

      2016-11-08 04:23:27,058 INFO  [AM.ZK.Worker-pool2-t84] master.RegionStates: Transition {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, ts=1478579007057, server=node-3.cluster,16020,1478578389506} to {d55ef81c2f8299abbddfce0445067830 state=OPEN, ts=1478579007058, server=node-3.cluster,16020,1478578389506}
      2016-11-08 04:23:27,059 INFO  [AM.ZK.Worker-pool2-t84] master.AssignmentManager: Handled SPLIT event; parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2., daughter a=IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830., daughter b=IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb., on node-3.cluster,16020,1478578389506
      

      RegionServer updates META - BUT APPARENTLY NOT CORRECTLY

      2016-11-08 04:23:27,165 INFO  [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] regionserver.SplitRequest: Region split, hbase:meta updated, and report to master. Parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2., new regions: IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830., IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb.. Split took 6sec
      

      RegionServer delays flush
      (Is this important?)

      2016-11-08 04:24:14,639 WARN  [MemStoreFlusher.0] regionserver.MemStoreFlusher: Region IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. has too many store files; delaying flush up to 90000ms
      

      Immediate warnings about No serialized HRegionInfo

      2016-11-08 04:24:44,691 WARN  [B.defaultRpcServer.handler=26,queue=2,port=16000] hbase.MetaTableAccessor: No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
      

      Master is not happy either

      2016-11-08 04:24:51,148 WARN  [MASTER_TABLE_OPERATIONS-node-1:16000-0] hbase.MetaTableAccessor: No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0}
      

      TestRunner MetaScanner complains about invalid entries in META missing HRegionInfo

      (standard input):9086:2016-11-08 05:04:17,230 WARN  [B.defaultRpcServer.handler=4,queue=1,port=16000] hbase.MetaTableAccessor: No serialized HRegionInfo in keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
      

      ITBLL MapReduce tasks fail because part of the keyspace cannot be located:

      java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
              at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293)
              at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1185)
              at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:410)
              at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:359)
              at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238)
              at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:154)
              at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:121)
              at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:486)
              at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:431)
              at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:375)
              at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
              at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
              at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
      
      ./application_1478574724776_0002/container_1478574724776_0002_01_000008/syslog:920:java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
      
      ./application_1478574724776_0002/container_1478574724776_0002_01_000010/syslog:920:java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
      
      ./application_1478574724776_0002/container_1478574724776_0002_01_000011/syslog:909:java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0}
      
      ./application_1478574724776_0002/container_1478574724776_0002_01_000030/syslog:48:java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
      
      ./application_1478574724776_0002/container_1478574724776_0002_01_000048/syslog:48:java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0}
      

        Attachments

        1. daughter_1_d55ef81c2f8299abbddfce0445067830.log
          148 kB
          Andrew Purtell
        2. daughter_2_08629d59564726da2497f70451aafcdb.log
          141 kB
          Andrew Purtell
        3. parent-393d2bfd8b1c52ce08540306659624f2.log
          874 kB
          Andrew Purtell
        4. logs.tar.gz
          3.00 MB
          Andrew Purtell
        5. HBASE-17069.branch-1.3.001.patch
          6 kB
          Abhishek Singh Chouhan
        6. HBASE-17069.branch-1.3.002.patch
          7 kB
          Abhishek Singh Chouhan
        7. HBASE-17069.master.001.patch
          6 kB
          Abhishek Singh Chouhan
        8. HBASE-17069-addendum.branch-1.3.001.patch
          1 kB
          Abhishek Singh Chouhan

          Issue Links

            Activity

              People

              • Assignee:
                abhishek.chouhan Abhishek Singh Chouhan
                Reporter:
                apurtell Andrew Purtell
              • Votes:
                0 Vote for this issue
                Watchers:
                26 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: