Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-2823 SCM HA Support
  3. HDDS-5216

ozone freon randomkeys failed after leader SCM node is down

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      under .../compose/ozone-ha, create a HA cluster:

      docker-compose up -d --scale datanode=3
      

      Initial SCM roles as following:

      bash-4.2$ ozone admin scm roles
      [scm1:9865:FOLLOWER, scm2:9865:FOLLOWER, scm3:9865:LEADER]
      

      Running freon random key generator as following:

      ozone freon randomkeys --numOfVolumes=10 --numOfBuckets 50 --numOfKeys 50  --replicationType=RATIS --factor=THREE
      

      While freon randomkeys was running, put all SCM nodes under blockade and stop leader SCM node:

      blockade status:

      NODE            CONTAINER ID    STATUS  IP              NETWORK    PARTITION  
      
                      18f9c1e2d52f    UP      172.31.0.9      NORMAL                
      
      ozone-ha_scm1_1                                                                
      
                      25c74f0a9271    UP      172.31.0.6      NORMAL                
      
      ozone-ha_scm2_1                                                                
      
                      8808d10ccb3a    DOWN                    UNKNOWN               
      
      ozone-ha_scm3_1
      

       

      freon randomkeys failed with following error message:

      Some test result msg as following:

      6:00:30,131 [pool-2-thread-3] ERROR freon.RandomKeyGenerator: Exception while adding key: key-21-80493 in bucket: bucket-44-63818 of volume: vol-1-95998.
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: No Route to Host from  om1/172.31.0.11 to scm3:9863 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  [http://wiki.apache.org/hadoop/NoRouteToHost]
      at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:604)
      at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.openKey(OzoneManagerProtocolClientSideTranslatorPB.java:595)
      at org.apache.hadoop.ozone.client.rpc.RpcClient.createKey(RpcClient.java:756)
      at org.apache.hadoop.ozone.client.OzoneBucket.createKey(OzoneBucket.java:502)
      at org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:703)
      at org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:86)
      at org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:621)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      at java.base/java.lang.Thread.run(Thread.java:834)
       44.10% |?????????????????????????????????????????????                                                        |  11024/25000 Time: 0:09:002021-05-11 06:00:37,231 [pool-2-thread-7] INFO metrics.RatisMetrics: Creating Metrics Registry : ratis.client_message_metrics.client-EA7B54107DBD->4c51bca8-cc0a-4c20-84dd-b5a7cb18c4ac
      2021-05-11 06:00:37,231 [pool-2-thread-7] WARN impl.MetricRegistriesImpl: First MetricRegistry has been created without registering reporters. You may need to call MetricRegistries.global().addReporterRegistration(...) before.
       100.00% |?????????????????????????????????????????????????????????????????????????????????????????????????????|  25000/25000 Time: 0:15:39
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: No Route to Host from  om1/172.31.0.11 to scm:9863 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  [http://wiki.apache.org/hadoop/NoRouteToHost]
      ***************************************************
      Status: Failed
      Git Base Revision: 7a3bc90b05f257c8ace2f76d74264906f0f7a932
      Number of Volumes created: 10
      Number of Buckets created: 500
      Number of Keys added: 24991
      Ratis replication factor: THREE
      Ratis replication type: RATIS
      Average Time spent in volume creation: 00:00:00,114
      Average Time spent in bucket creation: 00:00:01,263
      Average Time spent in key creation: 00:02:48,698
      Average Time spent in key write: 00:00:04,216
      Total bytes written: 255907840
      Total Execution time: 00:15:39,968
      ***************************************************
      

      In this case, I'd expect the freon test would still finish successfully.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bharat Bharat Viswanadham
            ghuangups George Huang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment