Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-1050

Permissions on YARN LCE should be 4754

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: None
    • Labels:
      None

      Description

      The permissions we set for the YARN container executor are not exactly correct and are different from what we used to set for the MRv1 task containers. The requirements for the permissions are as follows:

      • Readable/executable by the group
      • Not executable by others
      • Not writable by others
      • Set UID
      • Owned by root

      I've tested this in YARN and have tested that I can still submit and run jobs successfully with these new permissions. This is somewhat second-hand information, so I'll CC Aaron T. Myers in case I've missed any important details or context...

        Activity

        Hide
        apurtell Andrew Purtell added a comment -

        +1 with an ack from one of the Hadoop guys.

        Show
        apurtell Andrew Purtell added a comment - +1 with an ack from one of the Hadoop guys.
        Hide
        atm Aaron T. Myers added a comment -

        +1, looks good to me.

        The 4x5x is what's actually necessary for the whole thing to work properly, and my understanding of preferred Linux packaging guidelines suggests that x7xx is the right thing to do when the file is owned by root because lesser permissions would be pointless, and xxx4 is the right thing to do when a file should not be executable by ordinary users because lesser permissions would be pointless (the code is OSS so nothing to hide in the binary) and we can't have greater permissions and maintain correctness.

        Show
        atm Aaron T. Myers added a comment - +1, looks good to me. The 4x5x is what's actually necessary for the whole thing to work properly, and my understanding of preferred Linux packaging guidelines suggests that x7xx is the right thing to do when the file is owned by root because lesser permissions would be pointless, and xxx4 is the right thing to do when a file should not be executable by ordinary users because lesser permissions would be pointless (the code is OSS so nothing to hide in the binary) and we can't have greater permissions and maintain correctness.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        We don't need the users to be able to even read the file. So 6050:root:<special group to which user running NM is part of> is the most restrictive permission that will work just fine.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - We don't need the users to be able to even read the file. So 6050:root:<special group to which user running NM is part of> is the most restrictive permission that will work just fine.
        Hide
        rvs Roman Shaposhnik added a comment -

        Sean Mackrory I think Vinod is absolutely right. Besides, I now see that NM is refusing to start on my test cluster:

        2013-10-09 14:42:23,336 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
        org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
        	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:148)
        	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:338)
        	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:386)
        Caused by: java.io.IOException: Linux container executor not configured properly (error=22)
        	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:153)
        	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:146)
        	... 3 more
        Caused by: org.apache.hadoop.util.Shell$ExitCodeException: Invalid permissions on container-executor binary.
        
        	at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
        	at org.apache.hadoop.util.Shell.run(Shell.java:379)
        	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
        	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:147)
        	... 4 more
        2013-10-09 14:42:23,347 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG: 
        

        Reopening the JIRA so you can take care of it. Thanks!

        Show
        rvs Roman Shaposhnik added a comment - Sean Mackrory I think Vinod is absolutely right. Besides, I now see that NM is refusing to start on my test cluster: 2013-10-09 14:42:23,336 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:148) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:338) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:386) Caused by: java.io.IOException: Linux container executor not configured properly (error=22) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:153) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:146) ... 3 more Caused by: org.apache.hadoop.util.Shell$ExitCodeException: Invalid permissions on container-executor binary. at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:147) ... 4 more 2013-10-09 14:42:23,347 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG: Reopening the JIRA so you can take care of it. Thanks!
        Hide
        mackrorysd Sean Mackrory added a comment -

        I'll spin up an instance to see if I can reproduce / debug the specifics, but if this is blocking anyone and can't wait, I'm all for just reverting the change. The motivation's really academic anyway.

        Show
        mackrorysd Sean Mackrory added a comment - I'll spin up an instance to see if I can reproduce / debug the specifics, but if this is blocking anyone and can't wait, I'm all for just reverting the change. The motivation's really academic anyway.
        Hide
        rvs Roman Shaposhnik added a comment -

        +1 for reverting it.

        Show
        rvs Roman Shaposhnik added a comment - +1 for reverting it.
        Hide
        mackrorysd Sean Mackrory added a comment -

        Reverted. I'll leave this open to dig into the details a bit more. I wasn't able to reproduce the same exception but I got a different one with logging, but it wasn't broken to begin with so let's definitely leave it as it was.

        Show
        mackrorysd Sean Mackrory added a comment - Reverted. I'll leave this open to dig into the details a bit more. I wasn't able to reproduce the same exception but I got a different one with logging, but it wasn't broken to begin with so let's definitely leave it as it was.
        Hide
        atm Aaron T. Myers added a comment -

        We don't need the users to be able to even read the file. So 6050:root:<special group to which user running NM is part of> is the most restrictive permission that will work just fine.

        Sure, but do you agree that it's actually over-restrictive? x7xx is not in fact more permissive, since it's owned by root and root can read/write/execute any file in the local FS. xxx4 doesn't allow users to write or execute, and being able to read the file shouldn't impact the security of the system at all.

        4754 is what we used to have for the LTC, and I know of no good reason why the LCE should be treated differently.

        Show
        atm Aaron T. Myers added a comment - We don't need the users to be able to even read the file. So 6050:root:<special group to which user running NM is part of> is the most restrictive permission that will work just fine. Sure, but do you agree that it's actually over-restrictive? x7xx is not in fact more permissive, since it's owned by root and root can read/write/execute any file in the local FS. xxx4 doesn't allow users to write or execute, and being able to read the file shouldn't impact the security of the system at all. 4754 is what we used to have for the LTC, and I know of no good reason why the LCE should be treated differently.
        Hide
        rvs Roman Shaposhnik added a comment -

        I think the status quo makes sense for Bigtop 0.7.0. Pushing this to 0.8.0 for longer term resolution.

        Show
        rvs Roman Shaposhnik added a comment - I think the status quo makes sense for Bigtop 0.7.0. Pushing this to 0.8.0 for longer term resolution.
        Hide
        mackrorysd Sean Mackrory added a comment -

        I agree with that

        Show
        mackrorysd Sean Mackrory added a comment - I agree with that
        Hide
        atm Aaron T. Myers added a comment -

        Certainly doesn't need to be fixed urgently, as there's no material difference in functionality. The thing that bugs me about this is that we currently treat the permissions for the LTC and LCE differently, both in documentation and packaging/deployment scripts. There's no good reason for that, and it complicates the situation a bit.

        Show
        atm Aaron T. Myers added a comment - Certainly doesn't need to be fixed urgently, as there's no material difference in functionality. The thing that bugs me about this is that we currently treat the permissions for the LTC and LCE differently, both in documentation and packaging/deployment scripts. There's no good reason for that, and it complicates the situation a bit.
        Hide
        rvs Roman Shaposhnik added a comment -

        Makes sense. Lets tackle it in 0.8.0.

        Show
        rvs Roman Shaposhnik added a comment - Makes sense. Lets tackle it in 0.8.0.
        Hide
        mackrorysd Sean Mackrory added a comment -

        I just tried it out with 4750 and it worked well. Not having the 4 bit, as you point out, achieves no additional security, but at least it works

        Thoughts?

        Show
        mackrorysd Sean Mackrory added a comment - I just tried it out with 4750 and it worked well. Not having the 4 bit, as you point out, achieves no additional security, but at least it works Thoughts?
        Hide
        mackrorysd Sean Mackrory added a comment -

        Update: YARN-1796 was just committed / pushed, so it'll be in Hadoop 2.6.0 and 4754 permission will work. I tested that the 4750 pach still works if we want to go with that until we release Hadoop 2.6 packages.

        Show
        mackrorysd Sean Mackrory added a comment - Update: YARN-1796 was just committed / pushed, so it'll be in Hadoop 2.6.0 and 4754 permission will work. I tested that the 4750 pach still works if we want to go with that until we release Hadoop 2.6 packages.
        Hide
        cos Konstantin Boudnik added a comment -

        Is it still a blocker for 0.8.0? Doesn't looks like it considering that it has been sitting idle for 5 weeks. Moving to 0.9.0 for now. Feel free to move back if the patch is there and ready for commit. RC1 is building as we speak

        Show
        cos Konstantin Boudnik added a comment - Is it still a blocker for 0.8.0? Doesn't looks like it considering that it has been sitting idle for 5 weeks. Moving to 0.9.0 for now. Feel free to move back if the patch is there and ready for commit. RC1 is building as we speak
        Hide
        cos Konstantin Boudnik added a comment -

        Is it being worked on? A candidate for 1.1.0 perhaps?

        Show
        cos Konstantin Boudnik added a comment - Is it being worked on? A candidate for 1.1.0 perhaps?
        Hide
        mackrorysd Sean Mackrory added a comment -

        I was waiting to revisit this until we shipped Hadoop 2.6.0 (which contained YARN-1796), which we are now. I'll re-test the proposed permissions and resubmit a patch. But given things are currently working without ideal permissions, this doesn't need to block 1.0.0 if that's what you're asking. Perfectly fine for this to wait until 1.1.0 if we'd rather be stabilizing things.

        Show
        mackrorysd Sean Mackrory added a comment - I was waiting to revisit this until we shipped Hadoop 2.6.0 (which contained YARN-1796 ), which we are now. I'll re-test the proposed permissions and resubmit a patch. But given things are currently working without ideal permissions, this doesn't need to block 1.0.0 if that's what you're asking. Perfectly fine for this to wait until 1.1.0 if we'd rather be stabilizing things.
        Hide
        cos Konstantin Boudnik added a comment -

        I was trying to make sure if this going to be ready by 1.0 time. If so and given it is ready - let's commit it.

        Show
        cos Konstantin Boudnik added a comment - I was trying to make sure if this going to be ready by 1.0 time. If so and given it is ready - let's commit it.
        Hide
        mackrorysd Sean Mackrory added a comment -

        Just tested the patch from Jan 7, 2014 on a new build of Hadoop 2.6.0 and was able to start YARN and run the compute Pi MR job... Do I have a +1 to commit?

        Show
        mackrorysd Sean Mackrory added a comment - Just tested the patch from Jan 7, 2014 on a new build of Hadoop 2.6.0 and was able to start YARN and run the compute Pi MR job... Do I have a +1 to commit?
        Hide
        evans_ye Evans Ye added a comment -

        Let me run a test on this. There will be a result within a day.

        Show
        evans_ye Evans Ye added a comment - Let me run a test on this. There will be a result within a day.
        Hide
        evans_ye Evans Ye added a comment -

        +1 to the 4754 patch. It is still valid. I'll commit this later.

        Show
        evans_ye Evans Ye added a comment - +1 to the 4754 patch. It is still valid. I'll commit this later.
        Hide
        mackrorysd Sean Mackrory added a comment -

        Just pushed the 4754 patch. Thanks!

        Show
        mackrorysd Sean Mackrory added a comment - Just pushed the 4754 patch. Thanks!
        Hide
        evans_ye Evans Ye added a comment -

        Thanks for doing the commit Sean Mackrory.

        Show
        evans_ye Evans Ye added a comment - Thanks for doing the commit Sean Mackrory .

          People

          • Assignee:
            mackrorysd Sean Mackrory
            Reporter:
            mackrorysd Sean Mackrory
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development