Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.90.4
    • Fix Version/s: 0.90.6, 0.92.0
    • Component/s: Client
    • Labels:
      None

      Description

      Since the client had a temporary network failure, After it recovered.
      I found my client thread was blocked.
      Looks below stack and logs, It said that we use a invalid CatalogTracker in function "tableExists".

      Block stack:
      "WriteHbaseThread33" prio=10 tid=0x00007f76bc27a800 nid=0x2540 in Object.wait() [0x00007f76af4f3000]
      java.lang.Thread.State: TIMED_WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)
      at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331)

      • locked <0x00007f7a67817c98> (a java.util.concurrent.atomic.AtomicBoolean)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366)
        at org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:427)
        at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:164)
        at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown Source)
        at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
      • locked <0x00007f7a4c5dc578> (a com.huawei.hdi.hbase.HbaseReOper)
        at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
        at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)

      In ZooKeeperNodeTracker, We don't throw the KeeperException to high level.
      So in CatalogTracker level, We think ZooKeeperNodeTracker start success and
      continue to process .

      [WriteHbaseThread33]2011-12-16 17:07:33,153[WARN ] | hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Unable to get data of znode /hbase/root-region-server | org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:557)
      org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
      at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
      at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
      at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
      at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
      at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
      at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
      at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown Source)
      at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
      at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
      at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)
      [WriteHbaseThread33]2011-12-16 17:07:33,361[ERROR] | hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Received unexpected KeeperException, re-throwing exception | org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.keeperException(ZooKeeperWatcher.java:385)
      org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
      at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
      at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
      at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
      at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
      at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
      at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
      at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown Source)
      at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
      at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
      at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)

      [WriteHbaseThread33]2011-12-16 17:07:33,361[FATAL] | Unexpected exception during initialization, aborting | org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1351)
      org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
      at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
      at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
      at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
      at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
      at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
      at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
      at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown Source)
      at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
      at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
      at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)

      1. HBASE-5060_trunk.patch
        2 kB
        gaojinchao
      2. HBASE-5060_Branch90trial.patch
        2 kB
        gaojinchao

        Activity

        Hide
        gaojinchao added a comment -

        1.This issue is diffcult to fix. I made a trial version.
        2.I have checked all using ZooKeeperNodeTracker, It seems fine.

        Show
        gaojinchao added a comment - 1.This issue is diffcult to fix. I made a trial version. 2.I have checked all using ZooKeeperNodeTracker, It seems fine.
        Hide
        Ted Yu added a comment -

        @Jinchao:
        Can you attach a patch for TRUNK ?

        Show
        Ted Yu added a comment - @Jinchao: Can you attach a patch for TRUNK ?
        Hide
        ramkrishna.s.vasudevan added a comment -

        +1 on patch... good work gao..

        Show
        ramkrishna.s.vasudevan added a comment - +1 on patch... good work gao..
        Hide
        gaojinchao added a comment -

        Patch for trunk

        Show
        gaojinchao added a comment - Patch for trunk
        Hide
        gaojinchao added a comment -

        Test case passed:
        My test code:
        try {
        HBaseAdmin hbase = new HBaseAdmin(config);
        while (true) {
        try {
        if (hbase.tableExists(tableName))

        { System.out.println("[FATAL] The usertable: " + tableName + " is already existed"); }

        try

        { Thread.sleep(50); }

        catch (InterruptedException e)

        { continue; }

        }catch(IOException e)

        { e.printStackTrace(); continue; }

        }
        1. run test case
        2. kill two zk servers(total three zk servers)
        3. start the killed server again

        Show
        gaojinchao added a comment - Test case passed: My test code: try { HBaseAdmin hbase = new HBaseAdmin(config); while (true) { try { if (hbase.tableExists(tableName)) { System.out.println("[FATAL] The usertable: " + tableName + " is already existed"); } try { Thread.sleep(50); } catch (InterruptedException e) { continue; } }catch(IOException e) { e.printStackTrace(); continue; } } 1. run test case 2. kill two zk servers(total three zk servers) 3. start the killed server again
        Hide
        Ted Yu added a comment -

        +1 if tests pass.

        Show
        Ted Yu added a comment - +1 if tests pass.
        Hide
        Ted Yu added a comment -

        Since this is critical, we should include this in 0.92.0

        Show
        Ted Yu added a comment - Since this is critical, we should include this in 0.92.0
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12507874/HBASE-5060_trunk.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -152 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/540//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/540//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/540//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12507874/HBASE-5060_trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -152 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/540//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/540//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/540//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        TestTableMapReduce and TestHFileOutputFormat passed with patch on TRUNK.

        Will integrate later today.

        Show
        Ted Yu added a comment - TestTableMapReduce and TestHFileOutputFormat passed with patch on TRUNK. Will integrate later today.
        Hide
        stack added a comment -

        +1 Small change. Thanks Jinchao.

        Show
        stack added a comment - +1 Small change. Thanks Jinchao.
        Hide
        Ted Yu added a comment -

        Integrated to 0.90, 0.92 and TRUNK.

        Thanks for the patch Jinchao.

        Thanks for the review Ram and Stack.

        Show
        Ted Yu added a comment - Integrated to 0.90, 0.92 and TRUNK. Thanks for the patch Jinchao. Thanks for the review Ram and Stack.
        Hide
        Ted Yu added a comment -
        Hanging test: Running org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed
        Hanging test: Running org.apache.hadoop.hbase.replication.TestMasterReplication
        

        I verified that the above tests passed on TRUNK, on MacBook.

        Show
        Ted Yu added a comment - Hanging test: Running org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed Hanging test: Running org.apache.hadoop.hbase.replication.TestMasterReplication I verified that the above tests passed on TRUNK, on MacBook.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2559 (See https://builds.apache.org/job/HBase-TRUNK/2559/)
        HBASE-5060 HBase client is blocked forever (Jinchao)

        tedyu :
        Files :

        • /hbase/trunk/CHANGES.txt
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2559 (See https://builds.apache.org/job/HBase-TRUNK/2559/ ) HBASE-5060 HBase client is blocked forever (Jinchao) tedyu : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #199 (See https://builds.apache.org/job/HBase-0.92/199/)
        HBASE-5060 HBase client is blocked forever (Jinchao)

        tedyu :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #199 (See https://builds.apache.org/job/HBase-0.92/199/ ) HBASE-5060 HBase client is blocked forever (Jinchao) tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #38 (See https://builds.apache.org/job/HBase-TRUNK-security/38/)
        HBASE-5060 HBase client is blocked forever (Jinchao)

        tedyu :
        Files :

        • /hbase/trunk/CHANGES.txt
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #38 (See https://builds.apache.org/job/HBase-TRUNK-security/38/ ) HBASE-5060 HBase client is blocked forever (Jinchao) tedyu : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #45 (See https://builds.apache.org/job/HBase-0.92-security/45/)
        HBASE-5060 HBase client is blocked forever (Jinchao)

        tedyu :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #45 (See https://builds.apache.org/job/HBase-0.92-security/45/ ) HBASE-5060 HBase client is blocked forever (Jinchao) tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
        Hide
        stack added a comment -

        Was committed a few days ago.

        Show
        stack added a comment - Was committed a few days ago.

          People

          • Assignee:
            gaojinchao
            Reporter:
            gaojinchao
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development