Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
The scenario is as follows..
- We definitely will have multiple applications running on top of yarn. These applications whenever run by users will need resources to be localized. Now the options what application-users will have for localizing resources are:-
- APPLICATION ... these files will be available for only that instance of the application and only for that single user. If we talk in terms of MR then for single job.
- PRIVATE ... available only for that user only for multiple runs of that application. Other users clearly will not be able to take advantage of that. So ideally will be wasting space (local resource cache) by replicating the same file again and again.
- PUBLIC... there will be only one copy of individual files of the application say APP_1..GOOD ..in the sense it will be accessible to all the users...But for secured clusters; users of different application (say APP_2) containers can then gain easy access to this applications (APP_1) private files and potentially may modify it.
So clearly we don't have any solution today to solve the above problem with existing RESOURCE_LOCALIZATION_TYPES without effectively using space. Therefore we need something like GROUP to address this scenario.
Thoughts??