Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8991

nodemanager not cleaning blockmgr directories inside appcache

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.6.0
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:
      None

      Description

      Hi, I'm running spark on yarn and have enabled the Spark Shuffle Service. I'm noticing that during the lifetime of my spark streaming application, the nm appcache folder is building up with blockmgr directories (filled with shuffle_*.data).

      Looking into the nm logs, it seems like the blockmgr directories is not part of the cleanup process of the application. Eventually disk will fill up and app will crash. I have both yarn.nodemanager.localizer.cache.cleanup.interval-ms and yarn.nodemanager.localizer.cache.target-size-mb set, so I don't think its a configuration issue.

      What is stumping me is the executor ID listed by spark during the external shuffle block registration doesn't match the executor ID listed in yarn's nm log. Maybe this executorID disconnect explains why the cleanup is not done ? I'm assuming that blockmgr directories are supposed to be cleaned up ?

       

      2018-11-05 15:01:21,349 INFO org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: Registered executor AppExecId{appId=application_1541045942679_0193, execId=1299} with ExecutorShuffleInfo{localDirs=[/mnt1/yarn/nm/usercache/auction_importer/appcache/application_1541045942679_0193/blockmgr-b9703ae3-722c-47d1-a374-abf1cc954f42], subDirsPerLocalDir=64, shuffleManager=org.apache.spark.shuffle.sort.SortShuffleManager}
      
       

       

      seems similar to https://issues.apache.org/jira/browse/YARN-7070, although I'm not sure if the behavior I'm seeing is spark use related.

      https://stackoverflow.com/questions/52923386/spark-streaming-job-doesnt-delete-shuffle-files has a stop gap solution of cleaning up via cron.

       

        Attachments

        1. yarn-nm-log.txt
          3 kB
          Hidayat Teonadi

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              teonadi Hidayat Teonadi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: