Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15646

Track failing tests in HDFS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • None
    • None
    • hdfs
    • None

    Description

      There are several Units that are consistently failing on Yetus for a log period of time.
      The list keeps growing and it is driving the repository into unstable status. Qbt  reports more than 40 failing unit tests on average.

      Personally, over the last week, with every submitted patch, I have to spend a considerable time looking at the same stack trace to double check whether or not the patch contributes to those failures.

      I found out that the majority of those tests were failing for quite sometime but no Jiras were filed.

      The main problem of those consistent failures is that they have side effect on the runtime of the other Junits by sucking up resources such as memory and ports.

      StripedFile and EC tests in particular are 100% show-ups in the list of bad tests.
      I looked at those tests and they certainly need some improvements (i.e., HDFS-15459). Is any one interested in those test cases? Can we just turn them off?

      I like to give some heads-up that we need some more collaboration to enforce the stability of the code set.

      • For all developers, please, file a Jira once you see a failing test whether it is unrelated to your patch or not. This gives heads-up to other developers about the potential failures. Please do not stop at commenting on your patch "this is unrelated to my work".
      • Volunteer to dedicate more time on fixing flaky tests.
      • Periodically, make sure that the list of failing tests does not exceed a certain number of tests. We have Qbt reports to monitor that, but there is no follow up on its status.
      • We should consider aggressive strategies such as blocking any merges until the code is brought back to stability.
      • We need a clear and well-defined process to address Yetus issues: configuration, investigating running out of memory, slowness..etc.
      • Turn-off the Junits within the modules that are not being actively used in the community (i.e., EC, stripedFiles, or..etc.). 

       

      CC: aajisaka, elgoiri, kihwal, daryn, weichiu

      Do you guys have any thoughts on the current status of the HDFS ?

       

      The following list is a quick list of failing Junits from Qbt reports:

       

       org.apache.hadoop.crypto.key.kms.server.TestKMS.testKMSProviderCaching1.5 sec1

       org.apache.hadoop.fs.azure.TestBlobMetadata.testFolderMetadata42 ms3

       org.apache.hadoop.fs.azure.TestBlobMetadata.testFirstContainerVersionMetadata46 ms3

       org.apache.hadoop.fs.azure.TestBlobMetadata.testPermissionMetadata27 ms3

       org.apache.hadoop.fs.azure.TestBlobMetadata.testOldPermissionMetadata19 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testNoTempBlobsVisible0.95 sec3

        org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testLinkBlobs33 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatusRootDir31 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryMoveToExistingDirectory0.25 sec3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatus29 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryAsExistingDirectory36 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameToDirWithSamePrefixAllowed23 ms

      3  org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testLSRootDir19 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testDeleteRecursively31 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck.testWasbFsck1 sec3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testChineseCharactersFolderRename1 sec3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListingWithZeroByteRenameMetadata41 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListing37 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testUriEncoding38 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testDeepFileCreation37 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testListDirectory29 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderRenameInProgress37 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameFolder34 ms

      3  org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameImplicitFolder27 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolder66 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testStoreDeleteFolder27 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRename40 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testListStatus36 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testRenameDirectoryAsEmptyDirectory0.26 sec3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testListStatusFilterWithSomeMatches23 ms

      3  org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testRenameDirectoryAsNonExistentDirectory28 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testGlobStatusSomeMatchesInDirectories26 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testGlobStatusWithMultipleWildCardMatches27 ms3

       org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testDeleteRecursively22 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testImplicitFolderDeleted0.99 sec3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testFileAndImplicitFolderSameName31 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testSetOwnerOnImplicitFolder26 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testFileInImplicitFolderDeleted30 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testImplicitFolderListed22 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testCreatingDeepFileCreatesExplicitFolder53 ms3

       org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations.testSetPermissionOnImplicitFolder22 ms3

       org.apache.hadoop.fs.azure.TestWasbFsck.testDelete1 sec3

       org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers1 min 30 sec17

       org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType 

       

       

      Attachments

        Issue Links

          1.
          RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk Sub-task Reopened Fengnan Li  
          2.
          TestWebHDFS#testLargeFile fails intermittently Sub-task Open Yongjun Zhang  
          3.
          Use JUnit Parameterized test suite in TestWriteReadStripedFile Sub-task Patch Available Huafeng Wang  
          4.
          TestErasureCodeBenchmarkThroughput#testECReadWrite fails intermittently Sub-task Open Unassigned  
          5.
          TestStartup#testStorageBlockContentsStaleAfterNNRestart fails intermittently Sub-task Open Ajith S

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          6.
          TestDirectoryScanner#testThrottling fails: Throttle is too permissive Sub-task Patch Available Daniel Templeton

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          7.
          TestDecommission.testIncludeByRegistrationName fails intermittently Sub-task Patch Available Binglin Chang  
          8.
          TestRetryCacheWithHA#testUpdatePipeline fails intermittently Sub-task Patch Available Ranith Sardar  
          9.
          TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops Sub-task Patch Available Ratandeep Ratti  
          10.
          TestTransferFsImage#testClientSideException fails intermittently Sub-task Open Unassigned  
          11.
          TestReconstructStripedFile.testNNSendsErasureCodingTasks fails occasionally Sub-task Open Unassigned  
          12.
          TestBalancer#testUnknownDatanode occasionally fails in trunk Sub-task Reopened Unassigned  
          13.
          Refactor TestBalancer for faster execution. Sub-task Open Unassigned  
          14.
          TestRouterRpcMultiDestination#testErasureCoding fails on trunk Sub-task Open Fengnan Li  
          15.
          TestStripedFileAppend#testAppendToNewBlock fails on trunk Sub-task Open Takanobu Asanuma  
          16.
          TestBlockTokenWithDFSStriped errors port binding Sub-task Open Unassigned  
          17.
          TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout Sub-task Open Hrishikesh Gadre  
          18.
          TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk intermittently Sub-task Open Unassigned  
          19.
          TestNamenodeCapacityReport#testXceiverCount is flaky Sub-task Open Unassigned  
          20.
          TestStandbyCheckpoints#testCheckpointBeforeNameNodeInitializationIsComplete fails intermittently Sub-task Open Unassigned  
          21.
          TestJournalNodeRespectsBindHostKeys fails consistently on branch-2.10 Sub-task Open Unassigned  
          22.
          TestObservernode#testMkdirsRaceWithObserverRead is flaky Sub-task Open Unassigned  
          23.
          TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles fails Sub-task Open Unassigned  
          24.
          TestDFSInotifyEventInputStreamKerberized#testWithKerberizedCluster fails Sub-task Open Unassigned  
          25.
          TestBlockTokenWithDFSStriped#testEnd2End fails Sub-task Open Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          26.
          TestFileTruncate#testTruncateWithDataNodesShutdownImmediately fails Sub-task Open Unassigned  

          Activity

            People

              Unassigned Unassigned
              ahussein Ahmed Hussein
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 78h
                  78h