Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6695

[Replication] Data will lose if RegionServer down during transferqueue

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 0.94.1
    • None
    • Replication
    • None

    Description

      When we ware testing Replication failover feature we found if we kill a regionserver during it transferqueue ,we found only part of the hlog znode copy to the right path because failover process is interrupted.

      Log:

      2012-08-29 12:20:05,660 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Moving dw92.kgb.sqa.cm4,60020,1346210789716's hlogs to my queue

      2012-08-29 12:20:05,765 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C13462107 89716.1346213720708 with data 210508162
      2012-08-29 12:20:05,850 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C13462107 89716.1346213886800 with data

      2012-08-29 12:20:05,938 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C1346210789716.1346213830559 with data

      2012-08-29 12:20:06,055 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dw92.kgb.sqa.cm4%2C60020%2C1346210789716.1346213775146 with data

      2012-08-29 12:20:06,277 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Failed all from region=.ME
      TA.,,1.1028785192, hostname=dw93.kgb.sqa.cm4, port=60020
      java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused
      at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
      at java.util.concurrent.FutureTask.get(FutureTask.java:83)
      at
      ......

      This server is down .....

      ZK node status:

      [zk: 10.232.98.77:2181(CONNECTED) 6] ls /hbase-test3-repl/replication/rs/dw92.kgb.sqa.cm4,60020,1346210789716
      [lock, 1, 1-dw89.kgb.sqa.cm4,60020,1346202436268]

      dw92 is down , but Node dw92.kgb.sqa.cm4,60020,1346210789716 can't be deleted

      Attachments

        1. HBASE-6695.patch
          3 kB
          terry zhang
        2. HBASE-6695-4trunk_v2.patch
          3 kB
          terry zhang
        3. HBASE-6695-4trunk_v3.patch
          4 kB
          terry zhang
        4. HBASE-6695-4trunk.patch
          3 kB
          terry zhang

        Issue Links

          Activity

            People

              Unassigned Unassigned
              terry_zhang terry zhang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: