Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15646

Track failing tests in HDFS

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • None
    • None
    • hdfs
    • None

    Description

      There are several Units that are consistently failing on Yetus for a log period of time.
      The list keeps growing and it is driving the repository into unstable status. Qbt  reports more than 40 failing unit tests on average.

      Personally, over the last week, with every submitted patch, I have to spend a considerable time looking at the same stack trace to double check whether or not the patch contributes to those failures.

      I found out that the majority of those tests were failing for quite sometime but no Jiras were filed.

      The main problem of those consistent failures is that they have side effect on the runtime of the other Junits by sucking up resources such as memory and ports.

      StripedFile and EC tests in particular are 100% show-ups in the list of bad tests.
      I looked at those tests and they certainly need some improvements (i.e., HDFS-15459). Is any one interested in those test cases? Can we just turn them off?

      I like to give some heads-up that we need some more collaboration to enforce the stability of the code set.

      • For all developers, please, file a Jira once you see a failing test whether it is unrelated to your patch or not. This gives heads-up to other developers about the potential failures. Please do not stop at commenting on your patch "this is unrelated to my work".
      • Volunteer to dedicate more time on fixing flaky tests.
      • Periodically, make sure that the list of failing tests does not exceed a certain number of tests. We have Qbt reports to monitor that, but there is no follow up on its status.
      • We should consider aggressive strategies such as blocking any merges until the code is brought back to stability.
      • We need a clear and well-defined process to address Yetus issues: configuration, investigating running out of memory, slowness..etc.
      • Turn-off the Junits within the modules that are not being actively used in the community (i.e., EC, stripedFiles, or..etc.). 

       

      CC: Akira Ajisaka, Íñigo Goiri, Kihwal Lee, Daryn Sharp, Wei-Chiu Chuang

      Do you guys have any thoughts on the current status of the HDFS ?

       

      The following list is a quick list of failing Junits from Qbt reports:

       

       org.apache.hadoop.crypto.key.kms.server.TestKMS.testKMSProviderCaching1.5 sec1

       org.apache.hadoop.fs.azure.TestBlobMetadata.testFolderMetadata42 ms3

       org.apache.hadoop.fs.azure.TestBlobMetadata.testFirstContainerVersionMetadata46 ms3

       org.apache.hadoop.fs.azure.TestBlobMetadata.testPermissionMetadata27 ms3

       org.apache.hadoop.fs.azure.TestBlobMetadata.testOldPermissionMetadata19 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testNoTempBlobsVisible0.95 sec3

        org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testLinkBlobs33 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatusRootDir31 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryMoveToExistingDirectory0.25 sec3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatus29 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryAsExistingDirectory36 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameToDirWithSamePrefixAllowed23 ms

      3  org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testLSRootDir19 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testDeleteRecursively31 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck.testWasbFsck1 sec3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testChineseCharactersFolderRename1 sec3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListingWithZeroByteRenameMetadata41 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListing37 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testUriEncoding38 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testDeepFileCreation37 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testListDirectory29 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderRenameInProgress37 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameFolder34 ms

      3  org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameImplicitFolder27 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolder66 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testStoreDeleteFolder27 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRename40 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testListStatus36 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testRenameDirectoryAsEmptyDirectory0.26 sec3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testListStatusFilterWithSomeMatches23 ms

      3  org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testRenameDirectoryAsNonExistentDirectory28 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testGlobStatusSomeMatchesInDirectories26 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testGlobStatusWithMultipleWildCardMatches27 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testDeleteRecursively22 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testImplicitFolderDeleted0.99 sec3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testFileAndImplicitFolderSameName31 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testSetOwnerOnImplicitFolder26 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testFileInImplicitFolderDeleted30 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testImplicitFolderListed22 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testCreatingDeepFileCreatesExplicitFolder53 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testSetPermissionOnImplicitFolder22 ms3

       org.apache.hadoop.fs.azure.TestWasbFsck.testDelete1 sec3

       org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers1 min 30 sec17

       org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType 

       

       

      Attachments

        Issue Links

        1.
        TestPersistBlocks#TestRestartDfsWithFlush flaky failure Sub-task Resolved Unassigned   Actions
        2.
        Occasional failure in TestDFSClientRetries#testGetFileChecksum because the number of available xcievers is set too low Sub-task Resolved Unassigned   Actions
        3.
        TestEditLogTailer is flaky Sub-task Resolved Unassigned   Actions
        4.
        TestBlockTokenWithDFSStriped fails intermittently Sub-task Resolved Ahmed Hussein   Actions
        5.
        TestDFSClientRetries#testGetFileChecksum fails intermittently Sub-task Resolved Ahmed Hussein

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1.5h
        Actions
        6.
        TestHAAppend#testMultipleAppendsDuringCatchupTailing is flaky Sub-task Resolved Ahmed Hussein

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 20m
        Actions
        7.
        TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s Sub-task Resolved Sammi Chen   Actions
        8.
        TestFileCreation#testServerDefaultsWithMinimalCaching fails intermittently Sub-task Resolved Ahmed Hussein

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2h 10m
        Actions
        9.
        TestFsDatasetImpl fails intermittently Sub-task Resolved Ahmed Hussein

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 50m
        Actions
        10.
        TestBPOfferService#testMissBlocksWhenReregister fails intermittently Sub-task Resolved Ahmed Hussein

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 40m
        Actions
        11.
        RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException Sub-task Resolved Akira Ajisaka

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2h
        Actions
        12.
        EC: Fix checksum computation in case of native encoders Sub-task Resolved Ayush Saxena

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 4h 40m
        Actions
        13.
        Flaky test TestSnapshotFileLength.testSnapshotfileLength Sub-task Resolved Ahmed Hussein   Actions
        14.
        TestBPOfferService#testMissBlocksWhenReregister is flaky Sub-task Resolved Unassigned   Actions
        15.
        Disable Broken Azure Junits Sub-task Resolved Ahmed Hussein

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        16.
        Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster Sub-task Resolved Ahmed Hussein

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 3h 40m
        Actions
        17.
        TestBPOfferService#testMissBlocksWhenReregister fails on trunk Sub-task Resolved Masatake Iwasaki

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h
        Actions
        18.
        TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk Sub-task Resolved Masatake Iwasaki

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        19.
        TestRouterRpcMultiDestination#testNamenodeMetrics fails on trunk Sub-task Resolved Masatake Iwasaki   Actions
        20.
        Testcase TestBalancer#testBalancerWithPinnedBlocks always fails Sub-task Resolved Unassigned   Actions
        21.
        TestReconstructStripedFile#testNNSendsErasureCodingTasks fails intermittently Sub-task Resolved Hemanth Boyina   Actions
        22.
        TestDistributedFileSystem#testGetFileBlockStorageLocationsBatching fails intermittently Sub-task Resolved Unassigned   Actions
        23.
        TestDFSOutputStream#testCloseTwice implementation is broken Sub-task Resolved Ahmed Hussein   Actions
        24.
        TestURLConnectionFactory fails by NoClassDefFoundError in branch-3.3 and branch-3.2 Sub-task Resolved Chao Sun

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 50m
        Actions
        25.
        TestUpgradeDomainBlockPlacementPolicy flaky Sub-task Resolved Ahmed Hussein

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 50m
        Actions
        26.
        TestFileChecksum should be parameterized Sub-task Resolved Masatake Iwasaki

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 40m
        Actions
        27.
        Fix intermittent falilure of TestDecommission#testAllocAndIBRWhileDecommission Sub-task Resolved Masatake Iwasaki

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        28.
        TestMultipleNNPortQOP#testMultipleNNPortOverwriteDownStream fails intermittently Sub-task Resolved Toshihiko Uchida

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 3h 40m
        Actions
        29.
        TestBalancerWithMultipleNameNodes#testBalancingBlockpoolsWithBlockPoolPolicy fails on trunk Sub-task Resolved Masatake Iwasaki

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h
        Actions
        30.
        Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig Sub-task Resolved Leon Gao

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 20m
        Actions
        31.
        TestBalancer#testMaxIterationTime fails sporadically Sub-task Resolved Toshihiko Uchida

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        32.
        TestRouterRpcMultiDestination#testProxyGetTransactionID and testProxyVersionRequest are flaky Sub-task Resolved Akira Ajisaka

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h
        Actions
        33.
        RBF: TestRouterFederationRename is flaky Sub-task Resolved Unassigned   Actions
        34.
        TestBalancerRPCDelay. testBalancerRPCDelayQpsDefault fails intermittently Sub-task Resolved Ahmed Hussein   Actions
        35.
        TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk Sub-task Resolved Ahmed Hussein

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 0.5h
        Actions
        36.
        Some tests in TestBlockRecovery are consistently failing Sub-task Resolved Viraj Jasani

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 6h 20m
        Actions
        37.
        TestBlockRecovery fails consistently on Branch-2.10 Sub-task Resolved Ahmed Hussein

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        38.
        TestDecommissioningStatus#testDecommissionStatus fails intermittently Sub-task Resolved Ajay Kumar

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        39.
        TestBootstrapAliasmap fails by BindException Sub-task Resolved Akira Ajisaka

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 50m
        Actions
        40.
        TestEditLogTailer#testStandbyTriggersLogRollsWhenTailInProgressEdits is flaky Sub-task Resolved Viraj Jasani

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 10h 50m
        Actions
        41.
        De-flake testDecommissionStatus Sub-task Resolved Viraj Jasani

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2h 40m
        Actions
        42.
        De-flake TestBlockScanner#testSkipRecentAccessFile Sub-task Resolved Viraj Jasani

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2h 20m
        Actions
        43.
        Flaky test TestFsDatasetImpl#testDnRestartWithHardLink Sub-task Resolved Viraj Jasani

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 9.5h
        Actions
        44.
        testMoverWithStripedFile fails intermittently Sub-task Resolved Viraj Jasani

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 3h 10m
        Actions
        45.
        RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk Sub-task Reopened Fengnan Li   Actions
        46.
        TestWebHDFS#testLargeFile fails intermittently Sub-task Open Yongjun Zhang   Actions
        47.
        Use JUnit Parameterized test suite in TestWriteReadStripedFile Sub-task Patch Available Huafeng Wang   Actions
        48.
        TestErasureCodeBenchmarkThroughput#testECReadWrite fails intermittently Sub-task Open Unassigned   Actions
        49.
        TestStartup#testStorageBlockContentsStaleAfterNNRestart fails intermittently Sub-task Open Ajith S

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        50.
        TestDirectoryScanner#testThrottling fails: Throttle is too permissive Sub-task Patch Available Daniel Templeton

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 50m
        Actions
        51.
        TestDecommission.testIncludeByRegistrationName fails intermittently Sub-task Patch Available Binglin Chang   Actions
        52.
        TestRetryCacheWithHA#testUpdatePipeline fails intermittently Sub-task Patch Available Ranith Sardar   Actions
        53.
        TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops Sub-task Patch Available Ratandeep Ratti   Actions
        54.
        TestTransferFsImage#testClientSideException fails intermittently Sub-task Open Unassigned   Actions
        55.
        TestReconstructStripedFile.testNNSendsErasureCodingTasks fails occasionally Sub-task Open Unassigned   Actions
        56.
        TestBalancer#testUnknownDatanode occasionally fails in trunk Sub-task Reopened Unassigned   Actions
        57.
        Refactor TestBalancer for faster execution. Sub-task Open Unassigned   Actions
        58.
        TestRouterRpcMultiDestination#testErasureCoding fails on trunk Sub-task Open Fengnan Li   Actions
        59.
        TestStripedFileAppend#testAppendToNewBlock fails on trunk Sub-task Open Takanobu Asanuma   Actions
        60.
        TestBlockTokenWithDFSStriped errors port binding Sub-task Open Unassigned   Actions
        61.
        TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout Sub-task Open Hrishikesh Gadre   Actions
        62.
        TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk intermittently Sub-task Open Unassigned   Actions
        63.
        TestFsDatasetImpl#testDnRestartWithHardLink fails intermittently Sub-task Resolved Unassigned   Actions
        64.
        TestNamenodeCapacityReport#testXceiverCount is flaky Sub-task Open Unassigned   Actions
        65.
        TestStandbyCheckpoints#testCheckpointBeforeNameNodeInitializationIsComplete fails intermittently Sub-task Open Unassigned   Actions
        66.
        TestJournalNodeRespectsBindHostKeys fails consistently on branch-2.10 Sub-task Open Unassigned   Actions
        67.
        TestObservernode#testMkdirsRaceWithObserverRead is flaky Sub-task Open Unassigned   Actions
        68.
        TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles fails Sub-task Open Unassigned   Actions
        69.
        TestDFSInotifyEventInputStreamKerberized#testWithKerberizedCluster fails Sub-task Open Unassigned   Actions
        70.
        TestHDFSFileSystemContract#testAppend fails Sub-task Resolved secfree

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h
        Actions
        71.
        TestBlockTokenWithDFSStriped#testEnd2End fails Sub-task Open Unassigned

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        72.
        TestFileTruncate#testTruncateWithDataNodesShutdownImmediately fails Sub-task Open Unassigned   Actions
        73.
        Fix TestDataNodeMetrics#testReceivePacketSlowMetrics Sub-task Resolved Haiyang Hu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2h 40m
        Actions
        74.
        De-flake TestRollingUpgrade#testRollback Sub-task Resolved Viraj Jasani

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2h 10m
        Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            ahussein Ahmed Hussein

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 78h
                78h

                Issue deployment