Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-882

Specify per user quota for private/application cache and user log files

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      At present there is no limit on the number of files / size of the files localized by single user. Similarly there is no limit on the size of the log files created by user via running containers.

      We need to restrict the user for this.
      For LocalizedResources; this has serious concerns in case of secured environment where malicious user can start one container and localize resources whose total size >= DEFAULT_NM_LOCALIZER_CACHE_TARGET_SIZE_MB. Thereafter it will either fail (if no extra space is present on disk) or deletion service will keep removing localized files for other containers/applications.
      The limit for logs/localized resources should be decided by RM and sent to NM via secured containerToken. All these configurations should per container instead of per user or per nm.

        Issue Links

          Activity

          Hide
          karasing Karan Singh added a comment -

          desperately need this.

          For a long running job appcache directory size is keep on increasing, eventually marking all nodes as unhealthy.

          # du -sh /home/usercache/root/appcache/application_1500063393962_3622/
          20G	/home/usercache/root/appcache/application_1500063393962_3622/
          
          # du -sh /home/usercache/root/appcache/application_1500063393962_3622/
          166G	/home/usercache/root/appcache/application_1500063393962_3622/
          
          Show
          karasing Karan Singh added a comment - desperately need this. For a long running job appcache directory size is keep on increasing, eventually marking all nodes as unhealthy. # du -sh /home/usercache/root/appcache/application_1500063393962_3622/ 20G /home/usercache/root/appcache/application_1500063393962_3622/ # du -sh /home/usercache/root/appcache/application_1500063393962_3622/ 166G /home/usercache/root/appcache/application_1500063393962_3622/
          Hide
          karasing Karan Singh added a comment - - edited

          Currently yarn.nodemanager.localizer.cache.target-size-mb and yarn.nodemanager.localizer.cache.cleanup.interval-ms triggers deletion serivce for non-running conatiners.

          However, for containers that are running and spilling data to

          {'yarn.nodemanager.local-dirs'}/usercache/<user>/appcache/<app_id>
          

          deletion service does not come into action, as a result filesystem gets full, nodes are marked unhealthy and application gets stuck.

          An interim solution is to have large storage for local-dirs. For long term solution, user quota should be specified for private/application cache.

          Show
          karasing Karan Singh added a comment - - edited Currently yarn.nodemanager.localizer.cache.target-size-mb and yarn.nodemanager.localizer.cache.cleanup.interval-ms triggers deletion serivce for non-running conatiners. However, for containers that are running and spilling data to {'yarn.nodemanager.local-dirs'}/usercache/<user>/appcache/<app_id> deletion service does not come into action, as a result filesystem gets full, nodes are marked unhealthy and application gets stuck. An interim solution is to have large storage for local-dirs. For long term solution, user quota should be specified for private/application cache.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Karan Singh,

          For long term solution, user quota should be specified for private/application cache.

          Agree, But it should be handled in such a way that containers should not fail after submission but allocation should not happen to these nodes

          Show
          Naganarasimha Naganarasimha G R added a comment - Karan Singh , For long term solution, user quota should be specified for private/application cache. Agree, But it should be handled in such a way that containers should not fail after submission but allocation should not happen to these nodes
          Hide
          karasing Karan Singh added a comment -

          +1

          Show
          karasing Karan Singh added a comment - +1
          Hide
          rkrassow Rostislaw Krassow added a comment -

          I got the same issue in production. During execution of a heavy hive join (with mapreduce execution join) the according $yarn.nodemanager.local-dirs/usercache/<user>/appcache/<app_id> grow up. This led to elimination of the nodes by RM.

          The quotas for private/application cache should reflect resource quotas for the defined YARN queues.

          Show
          rkrassow Rostislaw Krassow added a comment - I got the same issue in production. During execution of a heavy hive join (with mapreduce execution join) the according $yarn.nodemanager.local-dirs/usercache/<user>/appcache/<app_id> grow up. This led to elimination of the nodes by RM. The quotas for private/application cache should reflect resource quotas for the defined YARN queues.

            People

            • Assignee:
              ojoshi Omkar Vinit Joshi
              Reporter:
              ojoshi Omkar Vinit Joshi
            • Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:

                Development