Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-10210

during master startup, RS can be you-are-dead-ed by master in error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.98.0, 0.96.1, 0.99.0, 0.96.1.1
    • 0.98.0, 0.99.0
    • None
    • None

    Description

      Not sure of the root cause yet, I am at "how did this ever work" stage.
      We see this problem in 0.96.1, but didn't in 0.96.0 + some patches.

      It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers.

      So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error!

      Note the threads.
      Addition is called from master initialization and from RPC.

      2013-12-19 11:16:45,290 INFO  [master:h2-ubuntu12-sec-1387431063-hbase-10:60000] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running.
      2013-12-19 11:16:45,290 INFO  [master:h2-ubuntu12-sec-1387431063-hbase-10:60000] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
      2013-12-19 11:16:45,290 INFO  [master:h2-ubuntu12-sec-1387431063-hbase-10:60000] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
      2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=60000] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
      2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=60000] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
      ...
      2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=60000] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error:
      ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server
      
      

      Presumably some of the recent ZK listener related changes b

      Attachments

        1. HBASE-10210.01.patch
          10 kB
          Sergey Shelukhin
        2. HBASE-10210.02.patch
          10 kB
          Sergey Shelukhin
        3. HBASE-10210.03.patch
          10 kB
          Sergey Shelukhin
        4. HBASE-10210.04.patch
          10 kB
          Sergey Shelukhin
        5. HBASE-10210.05.patch
          10 kB
          Sergey Shelukhin
        6. HBASE-10210.patch
          11 kB
          Sergey Shelukhin

        Issue Links

          Activity

            People

              sershe Sergey Shelukhin
              sershe Sergey Shelukhin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: