Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2366

Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      There are around 20 unit tests (out of around 2000) fail intermittently after TEZ-2333. Here is a stack:

      org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/attempt_1429899954360_0001_1_01_000000_1_10003/file.out.index in any of the configured local directories
              at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
              at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
              at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      

      To reproduce that in Pig test, using the following commands:
      svn co http://svn.apache.org/repos/asf/pig/trunk
      ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism test

      Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "true" (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does not help.

        Attachments

        1. TEZ-2366.wip.1.patch
          4 kB
          Prakash Ramachandran
        2. TEZ-2366.test.txt
          1.0 kB
          Siddharth Seth
        3. TEZ-2366.4.patch
          26 kB
          Prakash Ramachandran
        4. TEZ-2366.3.patch
          24 kB
          Prakash Ramachandran
        5. TEZ-2366.2.patch
          23 kB
          Prakash Ramachandran
        6. TEZ-2366.1.patch
          23 kB
          Prakash Ramachandran

          Activity

            People

            • Assignee:
              pramachandran Prakash Ramachandran
              Reporter:
              daijy Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: