Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-1127

Fix failing and intermittent Ozone unit tests

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Done
    • None
    • None
    • None
    • HDDS BadLands

    Description

      Full Ozone build with acceptance + unit tests takes ~1.5 hour.

      In the last 30 hours I executed a new full build at every 2 hours and collected all the results.

      We have ~1200 test method and ~15 are failed more than 4 times out of the 17 run.

      I propose the following method to fix them:

      1. Turn them off immediately (@Skip) to get real data for the pre-commits
      2. Create a Jira for every failing tests with assignee (I would choose an assignee based on the history of the unit test). We can adjust the assignee later but I would prefer use a default person instead of creating unassigned jira-s.
      3. Fix them and enable the tests again.

      Attachments

        Issue Links

          1.
          TestSCMChillModeManager is failing with NullPointerException Sub-task Resolved Lokesh Jain  
          2.
          Fix failing unit test methods of TestDeadNodeHandler Sub-task Resolved Nandakumar  
          3.
          testDelegationToken is failing in TestSecureOzoneCluster Sub-task Resolved Ajay Kumar

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          4.
          Fix failing unit tests in TestOzoneManager Sub-task Resolved Nandakumar  
          5.
          TestOzoneManagerHA.testTwoOMNodesDown is failing with ratis error Sub-task Resolved Hanisha Koneru

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 20m
          6.
          Disable failing test which are tracked by a separated jira Sub-task Resolved Marton Elek

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          7.
          TestRandomKeyGenerator fails with NPE Sub-task Resolved Marton Elek

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          8.
          Test SCMChillMode failing randomly in Jenkins run Sub-task Resolved Bharat Viswanadham

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          9.
          TestContainerActionsHandler.testCloseContainerAction has an intermittent failure Sub-task Resolved Marton Elek

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          10.
          Remove TestContainerSQLCli unit test stub Sub-task Resolved Marton Elek

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          11.
          Enable TestSCMNodeManager#testScmStatsFromNodeReport Sub-task Resolved Chun-Hao Yang  
          12.
          Enable TestSCMNodeManager#testScmNodeReportUpdate Sub-task Resolved Chun-Hao Yang  
          13.
          Enable TestNodeFailure test cases Sub-task Resolved Nandakumar  
          14.
          Enable TestOzoneRpcClientWithRatis test cases Sub-task Resolved Prashant Pogde  
          15.
          Enable TestOmMetrics#testBucketOps Sub-task Resolved Chun-Hao Yang  
          16.
          Enable TestSCMPipelineMetrics test cases Sub-task Resolved Chun-Hao Yang  
          17.
          Enable TestCloseContainerByPipeline test cases Sub-task Resolved Prashant Pogde  
          18.
          Enable TestCloseContainerHandlingByClient test cases Sub-task Resolved Shashikant Banerjee  
          19.
          Enable TestContainerStateMachineFailures test cases Sub-task Resolved Shashikant Banerjee  
          20.
          Enable TestOzoneRpcClientAbstract#testPutKeyRatisThreeNodesParallel Sub-task Resolved Prashant Pogde  
          21.
          Enable TestFreonWithPipelineDestroy test cases Sub-task Resolved Chun-Hao Yang  
          22.
          Enable TestOzoneRpcClientAbstract#testListVolume Sub-task Resolved Prashant Pogde  
          23.
          Enable TestWatchForCommit test cases Sub-task Resolved Shashikant Banerjee  
          24.
          Enable TestContainerReplicationEndToEnd test cases Sub-task Resolved Prashant Pogde  
          25.
          Enable TestStorageContainerManager test cases Sub-task Resolved Prashant Pogde  
          26.
          Enable TestRatisPipelineCreateAndDestroy test cases Sub-task Resolved pratap chandu  
          27.
          Enable TestOMRatisSnapshots test cases Sub-task Resolved Hanisha Koneru  
          28.
          Enable TestOzoneContainer test cases Sub-task Resolved Prashant Pogde  
          29.
          Enable TestBlockDeletion test cases Sub-task Resolved Lokesh Jain  
          30.
          Enable TestSCMSafeModeWithPipelineRules test cases Sub-task Resolved Sadanand Shenoy  
          31.
          Enable TestOzoneAtRestEncryption test cases Sub-task Resolved Unassigned  
          32.
          Enable TestContainerReplication test cases Sub-task Resolved Sadanand Shenoy  
          33.
          Enable TestCSMMetrics test cases Sub-task Resolved Aryan Gupta  
          34.
          Enable TestOzoneContainerRatis test cases Sub-task Resolved Unassigned  
          35.
          Enable TestRatisManager test cases Sub-task Resolved Unassigned  
          36.
          Enable TestSecureContainerServer test cases Sub-task Resolved Unassigned  
          37.
          Enable TestKeyManagerImpl test cases Sub-task Resolved Aryan Gupta  
          38.
          Enable TestOmSQLCli#testOmDB Sub-task Resolved Unassigned  
          39.
          Enable TestOzoneManagerHA test cases Sub-task Resolved Unassigned  
          40.
          Enable TestOzoneManagerRestart test cases Sub-task Resolved Prashant Pogde  
          41.
          Enable TestScmSafeMode test cases Sub-task Resolved Unassigned  
          42.
          Enable TestBucketManagerImpl test cases Sub-task Resolved Unassigned  
          43.
          Flaky TestWatchForCommit#test2WayCommitForTimeoutException Sub-task Resolved Unassigned  
          44.
          ITestRootedOzoneContract tests are flaky Sub-task Resolved Siyao Meng  
          45.
          Intermittent failure in Recon acceptance test due to too many pipelines Sub-task Resolved Unassigned  
          46.
          Intermittent failure in TestDeleteWithSlowFollower Sub-task Resolved Attila Doroszlai  
          47.
          Intermittent failure in TestReadRetries Sub-task Resolved Unassigned  
          48.
          Intermittent failure in testContainerImportExport Sub-task Resolved Attila Doroszlai  
          49.
          Intermittent test failure related to a race conditon during PipelineManager close Sub-task Resolved Marton Elek  
          50.
          TestOzoneManagerRocksDBLogging.shutdown times out Sub-task Resolved Unassigned  
          51.
          TestOzoneClientKeyGenerator is flaky Sub-task Resolved Aryan Gupta  
          52.
          Flaky test TestContainerStateMachineFailureOnRead#testReadStateMachineFailureClosesPipeline Sub-task Resolved Aryan Gupta  
          53.
          Intermittent failure in TestOzoneDelegationTokenSecretManager Sub-task Resolved Unassigned  
          54.
          Intermittent timeout in TestCloseContainerHandlingByClient#testMultiBlockWrites3#testDiscardPreallocatedBlocks Sub-task Resolved Aryan Gupta  
          55.
          Fix flaky TestContainerStateMachineFailures#testApplyTransactionFailure Sub-task Resolved Unassigned  
          56.
          TestBlockOutputStreamWithFailures is flaky Sub-task Resolved Unassigned  
          57.
          Fix flaky test TestWatchForCommit#testWatchForCommitWithKeyWrite Sub-task Resolved Unassigned  
          58.
          TestOzoneClientRetriesOnException is flaky Sub-task Resolved Unassigned  
          59.
          TestRatisPipelineUtils.testPipelineCreationOnNodeRestart is flaky Sub-task Resolved Unassigned  
          60.
          FLAKY-UT: TestFailureHandlingByClientFlushDelay timeout Sub-task Resolved Unassigned  
          61.
          FLAKY-UT: TestSecureOzoneRpcClient timeout Sub-task Resolved Unassigned  
          62.
          Intermittent timeout in TestRatisPipelineLeader Sub-task Resolved Xu Shao Hong  
          63.
          Intermittent timeout in TestOzoneRpcClient Sub-task Resolved Unassigned  
          64.
          Intermittent failure in TestEndPoint#testGetVersionAssertRpcTimeOut Sub-task Resolved Unassigned  
          65.
          SCM sometimes cannot exit safe mode Sub-task Resolved Unassigned  
          66.
          Intermittent failure in TestSCMContainerPlacementPolicyMetrics Sub-task Resolved Unassigned  
          67.
          TestBlockManager#testMultipleBlockAllocationWithClosedContainer timed out Sub-task Resolved Unassigned  
          68.
          TestMiniChaosOzoneCluster may run until OOME Sub-task Resolved Unassigned  
          69.
          TestRandomKeyGenerator fails due to timeout Sub-task Resolved Aryan Gupta  
          70.
          TestNodeFailure times out intermittently Sub-task Resolved Unassigned  
          71.
          TestNodeReportHandler failing because of NPE Sub-task Resolved Nandakumar  
          72.
          OM HA S3 test failure Sub-task Resolved Unassigned  
          73.
          Fail Unit Test: TestMiniChaosOzoneCluster Sub-task Resolved Jie Wang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 10m
          74.
          TestOzoneManagerHA#testOMProxyProviderFailoverOnConnectionFailure fails intermittently Sub-task Resolved Hanisha Koneru  
          75.
          TestContainerStateManagerIntegration#testReplicaMap fails with ChillModePrecheck Sub-task Resolved Unassigned  
          76.
          Fix TestOzoneManagerHttpServer & TestStorageContainerManagerHttpServer Sub-task Resolved Nandakumar  
          77.
          TestContainerPersistence#testDeleteBlockTwice is failing Sub-task Resolved Unassigned  
          78.
          TestOzoneManagerHttpServer#testHttpPolicy fails intermittently Sub-task Resolved Unassigned  
          79.
          TestTableCacheImpl#testPartialTableCacheWithOverrideAndDelete fails intermittently Sub-task Resolved Unassigned  
          80.
          Some ozone unit test takes too long to finish. Sub-task Resolved Unassigned  
          81.
          Allocate block fails in MiniOzoneChaosCluster because of InsufficientDatanodesException Sub-task Resolved Unassigned  
          82.
          FLAKY-UT: TestWatchForCommit#test2WayCommitForTimeoutException Sub-task Resolved Unassigned  
          83.
          FLAKY-UT: TestWatchForCommit#testWatchForCommitForGroupMismatchException Sub-task Resolved Attila Doroszlai  
          84.
          FLAKY-UT: TestCommitWatcher#testReleaseBuffersOnException Sub-task Resolved Unassigned  
          85.
          TestKeyValueContainer#testContainerImportExport[0] Sub-task Resolved Unassigned  
          86.
          Wrong mock assumption in TestOmMetrics#testBucketOps Sub-task Resolved Aswin Shakil  

          Activity

            People

              Unassigned Unassigned
              elek Marton Elek
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 7h 50m
                  7h 50m