Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2366

Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.7.0
    • None
    • None

    Description

      There are around 20 unit tests (out of around 2000) fail intermittently after TEZ-2333. Here is a stack:

      org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/attempt_1429899954360_0001_1_01_000000_1_10003/file.out.index in any of the configured local directories
              at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
              at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
              at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      

      To reproduce that in Pig test, using the following commands:
      svn co http://svn.apache.org/repos/asf/pig/trunk
      ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism test

      Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "true" (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does not help.

      Attachments

        1. TEZ-2366.4.patch
          26 kB
          Prakash Ramachandran
        2. TEZ-2366.3.patch
          24 kB
          Prakash Ramachandran
        3. TEZ-2366.2.patch
          23 kB
          Prakash Ramachandran
        4. TEZ-2366.1.patch
          23 kB
          Prakash Ramachandran
        5. TEZ-2366.wip.1.patch
          4 kB
          Prakash Ramachandran
        6. TEZ-2366.test.txt
          1.0 kB
          Siddharth Seth

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pramachandran Prakash Ramachandran
            daijy Daniel Dai
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment