Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-527

Local filecache mkdir fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 2.0.0-alpha
    • None
    • nodemanager
    • None
    • RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and six worker nodes.

    Description

      Jobs failed with no other explanation than this stack trace:

      2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
      nostics report from attempt_1364591875320_0017_m_000000_0: java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893
      55400878397 failed
      at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
      at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
      at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
      at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
      at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
      at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
      at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
      at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
      at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

      Manually creating the directory worked. This behavior was common to at least several nodes in the cluster.

      The situation was resolved by removing and recreating all /disk?/yarn/local/filecache directories on all nodes.

      It is unclear whether Yarn struggled with the number of files or if there were corrupt files in the caches. The situation was triggered by a node dying.

      Attachments

        1. yarn-site.xml
          3 kB
          Knut O. Hellan

        Issue Links

          Activity

            People

              Unassigned Unassigned
              khellan Knut O. Hellan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: