Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-543 [Umbrella] NodeManager localization related issues
  3. YARN-99

Jobs fail during resource localization when private distributed-cache hits unix directory limits

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha, 3.0.0-alpha1
    • Fix Version/s: 2.1.0-beta
    • Component/s: nodemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception.

      java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
      	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
      	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
      	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
      	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
      	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
      	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
      	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
      	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
      	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      

      We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size.

        Attachments

        1. yarn-99-20130324.patch
          47 kB
          Omkar Vinit Joshi
        2. yarn-99-20130403.patch
          47 kB
          Omkar Vinit Joshi
        3. yarn-99-20130403.1.patch
          47 kB
          Omkar Vinit Joshi
        4. yarn-99-20130408.patch
          58 kB
          Omkar Vinit Joshi
        5. yarn-99-20130408.1.patch
          58 kB
          Omkar Vinit Joshi

          Issue Links

            Activity

              People

              • Assignee:
                ojoshi Omkar Vinit Joshi
                Reporter:
                devaraj Devaraj Kavali
              • Votes:
                1 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: