Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-23764

Flaky tests due to ZK client name resolution delays

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Test
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-1
    • 3.0.0-alpha-1, 2.3.0
    • test
    • None

    Description

      Nick Dimiduk and I ran into this issue (separately) and we noticed that there are some performance issues with name resolution in the Zookeeper client. Since we use ZK heavily in the unit tests, this often manifests as the following issues

      1. Test time outs starting the mini cluster (Master failed to start....)
      2. InterruptedException (because the tests timeout)
      3. Flaky tests because a subset of the cluster fails to start for whatever reason (replication tests especially because they spawn multiple clusters).
      4. ConnectionLoss to znode /hbase/xyzz.. JVM pause?

      I have strong feeling that this is a possible cause for many of our flaky tests in Jenkins. Luckily, it looks like the following workaround to switch to an IP address instead of hostname makes it much quicker. There are some related discussions in the ZK community (ZOOKEEPER-1666 and related jiras).

      --- a/hbase-common/src/main/resources/hbase-default.xml
      +++ b/hbase-common/src/main/resources/hbase-default.xml
      @@ -72,7 +72,7 @@ possible configurations would overwhelm and obscure the important.
         </property>
         <property>
           <name>hbase.zookeeper.quorum</name>
      -    <value>localhost</value>
      +    <value>127.0.0.1</value>
           <description>Comma separated list of servers in the ZooKeeper ensemble
           (This config. should have been named hbase.zookeeper.ensemble).
           For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
      

      Until we figure out the actual root cause and a dependency upgrade (if needed), we should consider making this hostname to IP switch for more stable builds.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bharathv Bharath Vissapragada Assign to me
            bharathv Bharath Vissapragada
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment