Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14536

Balancer & SSH interfering with each other leading to unavailability

    XMLWordPrintableJSON

Details

    Description

      Came across this in our cluster:
      1. The meta was assigned to a server 10.0.0.149,16020,1443507203340

      2015-09-29 06:16:22,472 DEBUG [AM.ZK.Worker-pool2-t56] 
      master.RegionStates: Onlined 1588230740 on 
      10.0.0.149,16020,1443507203340 {ENCODED => 1588230740, NAME => 
      'hbase:meta,,1', STARTKEY => '', ENDKEY => ''}
      

      2. The server dies at some point:

      2015-09-29 06:18:25,952 INFO  [main-EventThread] 
      zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, 
      processing expiration [10.0.0.149,16020,1443507203340]
      
      2015-09-29 06:18:25,955 DEBUG [main-EventThread] master.AssignmentManager: based on AM, current 
      region=hbase:meta,,1.1588230740 is on server=10.0.0.149,16020,1443507203340 server being checked: 
      10.0.0.149,16020,1443507203340
      

      3. The balancer had computed a plan that contained a move for the meta:

      2015-09-29 06:18:26,833 INFO  [B.defaultRpcServer.handler=12,queue=0,port=16000] master.HMaster: 
      balance hri=hbase:meta,,1.1588230740, 
      src=10.0.0.149,16020,1443507203340, dest=10.0.0.205,16020,1443507257905
      

      4. The following ensues after this, leading to the meta remaining unassigned:

      2015-09-29 06:18:26,859 DEBUG [B.defaultRpcServer.handler=12,queue=0,port=16000] 
      master.AssignmentManager: Offline hbase:meta,,1.1588230740, no need to 
      unassign since it's on a dead server: 10.0.0.149,16020,1443507203340
      ......................
      2015-09-29 06:18:26,899 INFO  [B.defaultRpcServer.handler=12,queue=0,port=16000] master.RegionStates: 
      Offlined 1588230740 from 10.0.0.149,16020,1443507203340
      .....................
      2015-09-29 06:18:26,914 INFO  [B.defaultRpcServer.handler=12,queue=0,port=16000] 
      master.AssignmentManager: Skip assigning hbase:meta,,1.1588230740, it is 
      on a dead but not processed yet server: 10.0.0.149,16020,1443507203340
      ....................
      2015-09-29 06:18:26,915 DEBUG [AM.ZK.Worker-pool2-t58] master.AssignmentManager: Znode hbase:meta,,1.1588230740 deleted, 
      state: {1588230740 state=OFFLINE, ts=1443507506914, 
      server=10.0.0.149,16020,1443507203340}
      ....................
      2015-09-29 06:18:29,447 DEBUG [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] master.AssignmentManager: based on AM, current 
      region=hbase:meta,,1.1588230740 is on server=null server being checked: 
      10.0.0.149,16020,1443507203340
      
      2015-09-29 06:18:29,451 INFO  [MASTER_META_SERVER_OPERATIONS-
      10.0.0.148:16000-2] handler.MetaServerShutdownHandler: META has been 
      assigned to otherwhere, skip assigning.
      
      2015-09-29 06:18:29,452 DEBUG [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] 
      master.DeadServer: Finished processing 10.0.0.149,16020,1443507203340
      

      Attachments

        1. HBASE-14536.v1-branch-1.patch
          10 kB
          Stephen Yuan Jiang
        2. HBASE-14536.v3-branch-1.1.patch
          8 kB
          Stephen Yuan Jiang
        3. HBASE-14536.v2-branch-1.1.patch
          8 kB
          Stephen Yuan Jiang
        4. HBASE-14536.v1-branch-1.1.patch
          2 kB
          Stephen Yuan Jiang
        5. master-log.tgz
          2.75 MB
          Devaraj Das

        Issue Links

          Activity

            People

              syuanjiang Stephen Yuan Jiang
              ddas Devaraj Das
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: