Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6891 Hadoop 2 unit test failures
  3. HBASE-8419

Hadoop2 MR tests fail with delete failing/hanging threads present

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      In flaky failure on hadoop2 runs of such as:

      • TestImportTsv/testBulkOutputWithoutAnExistingTable
      • TestImportTsv/testMROnTable
      • TestImportExport/testWithFilter
      • (and many others)

      We have logs with hanging threads and failed file deletes that look like this.

      2013-04-24 06:05:01,807 WARN  [ContainersLauncher #0] nodemanager.DefaultContainerExecutor(193): Exit code from task is : 137
      2013-04-24 06:05:06,520 INFO  [pool-1-thread-1] hbase.ResourceChecker(171): after: mapreduce.TestImportExport#testExportScannerBatching Thread=539 (was 534)
      Potentially hanging thread: hbase-table-pool-25-thread-1
      	sun.misc.Unsafe.park(Native Method)
      	java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
      	java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424)
      ...
      <threads seemingly related to dfs connection>
      
      2013-04-24 06:03:28,351 WARN  [DeletionService #0] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_0/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
      2013-04-24 06:03:28,353 WARN  [DeletionService #1] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_1/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
      2013-04-24 06:03:28,353 WARN  [DeletionService #2] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_2/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
      2013-04-24 06:03:28,354 WARN  [DeletionService #0] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_3/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jmhsieh Jonathan Hsieh
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: