Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6891 Hadoop 2 unit test failures
  3. HBASE-8419

Hadoop2 MR tests fail with delete failing/hanging threads present

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • None
    • None
    • None
    • None

    Description

      In flaky failure on hadoop2 runs of such as:

      • TestImportTsv/testBulkOutputWithoutAnExistingTable
      • TestImportTsv/testMROnTable
      • TestImportExport/testWithFilter
      • (and many others)

      We have logs with hanging threads and failed file deletes that look like this.

      2013-04-24 06:05:01,807 WARN  [ContainersLauncher #0] nodemanager.DefaultContainerExecutor(193): Exit code from task is : 137
      2013-04-24 06:05:06,520 INFO  [pool-1-thread-1] hbase.ResourceChecker(171): after: mapreduce.TestImportExport#testExportScannerBatching Thread=539 (was 534)
      Potentially hanging thread: hbase-table-pool-25-thread-1
      	sun.misc.Unsafe.park(Native Method)
      	java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
      	java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424)
      ...
      <threads seemingly related to dfs connection>
      
      2013-04-24 06:03:28,351 WARN  [DeletionService #0] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_0/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
      2013-04-24 06:03:28,353 WARN  [DeletionService #1] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_1/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
      2013-04-24 06:03:28,353 WARN  [DeletionService #2] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_2/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
      2013-04-24 06:03:28,354 WARN  [DeletionService #0] nodemanager.DefaultContainerExecutor(276): delete returned false for path: [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_3/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            jmhsieh Jonathan Hsieh
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: