Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4572

Can not access user logs - Jetty is not configured by default to serve aliases/symlinks

    Details

    • Hadoop Flags:
      Reviewed

      Description

      The task log servlet can no longer access user logs because MAPREDUCE-2415 introduce symlinks to the logs and jetty is not configured by default to serve symlinks.

      1. MAPREDUCE-4572_trunk.patch
        3 kB
        Ahmed Radwan
      2. MAPREDUCE-4572.patch
        2 kB
        Ahmed Radwan

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Ahmed and Alejandro convinced me, since this endpoint is admin-only, and admins already have the ability to get into the cluster.

          Show
          Todd Lipcon added a comment - Ahmed and Alejandro convinced me, since this endpoint is admin-only, and admins already have the ability to get into the cluster.
          Hide
          Alejandro Abdelnur added a comment -

          @Todd, at the moment this is accessible to admins only, the /logs endpoint is served by AdminAuthorizedServlet, which is different from /tasklogs endpoint, served by TaskLogServlet. The later is only accessible to regular users.

          Show
          Alejandro Abdelnur added a comment - @Todd, at the moment this is accessible to admins only, the /logs endpoint is served by AdminAuthorizedServlet , which is different from /tasklogs endpoint, served by TaskLogServlet . The later is only accessible to regular users.
          Hide
          Todd Lipcon added a comment -

          Though, looking at the Jetty code, we already have a vulnerability here, because its symlink check is racy (TOCTOU). So maybe we should just say: any time we use Jetty's built in file servlet, we need to make sure it is either accessing a directory which isn't user-writable, or it is accessible to admins only.

          Show
          Todd Lipcon added a comment - Though, looking at the Jetty code, we already have a vulnerability here, because its symlink check is racy (TOCTOU). So maybe we should just say: any time we use Jetty's built in file servlet, we need to make sure it is either accessing a directory which isn't user-writable, or it is accessible to admins only.
          Hide
          Daryn Sharp added a comment -

          I'm kind of with Todd. Symlink attacks are very difficult to identify or prevent. I'd lean towards a solution that doesn't require an admin to open a hole.

          Show
          Daryn Sharp added a comment - I'm kind of with Todd. Symlink attacks are very difficult to identify or prevent. I'd lean towards a solution that doesn't require an admin to open a hole.
          Hide
          Ahmed Radwan added a comment -

          Sorry for the confusion in the description, it is the TaskTracker /logs servlet. I see your points and understand your concerns. As you know, MAPREDUCE-2415 introduced these new design changes and symlinks. So with this new design, and without this patch, these symlinks in userlogs are not longer served. I agree that the use of symlinks and the way of serving them need to be revisited in a more generic way.

          Regarding this ticket, what do you think about amending the current patch to have this "aliases serving" disabled by default, and also amending the docs for this added property to highlight these security considerations.

          This seems sufficient since we are relying on the admin to explicitly enable this property. Additionally, this servlet is admin authorized, so normal or malicious users won't have access, and won't be able to view unauthorized contents through this servlet. What do you think?

          Show
          Ahmed Radwan added a comment - Sorry for the confusion in the description, it is the TaskTracker /logs servlet. I see your points and understand your concerns. As you know, MAPREDUCE-2415 introduced these new design changes and symlinks. So with this new design, and without this patch, these symlinks in userlogs are not longer served. I agree that the use of symlinks and the way of serving them need to be revisited in a more generic way. Regarding this ticket, what do you think about amending the current patch to have this "aliases serving" disabled by default, and also amending the docs for this added property to highlight these security considerations. This seems sufficient since we are relying on the admin to explicitly enable this property. Additionally, this servlet is admin authorized, so normal or malicious users won't have access, and won't be able to view unauthorized contents through this servlet. What do you think?
          Hide
          Alejandro Abdelnur added a comment -

          Just to be clear, the configuration that this patch is setting in Jetty only affects the 'log/' context which only has the servlet serving the logs..

          Show
          Alejandro Abdelnur added a comment - Just to be clear, the configuration that this patch is setting in Jetty only affects the 'log/' context which only has the servlet serving the logs..
          Hide
          Steve Loughran added a comment -

          I'd go for a solution that doesn't do symlinks. If it's the TaskLog servlet, then surely it is what needs changing.

          Show
          Steve Loughran added a comment - I'd go for a solution that doesn't do symlinks. If it's the TaskLog servlet, then surely it is what needs changing.
          Hide
          Todd Lipcon added a comment -

          So we are basically here exposing the controls that Jetty already provides, and the users can make the decision, and based on the available security setup on their environment, they can make a decision of enabling or disabling this behavior. What do you think

          I don't think most users understand the significance of the security issue here. Symlink attacks can be pretty darn subtle, and I think a well-intentioned administrator will end up opening a really bad hole just to get around the issue. Instead, we should figure out a secure way to solve the original use case. (which I'm still confused about, per above, because the description mentions the TaskLog servlet).

          Show
          Todd Lipcon added a comment - So we are basically here exposing the controls that Jetty already provides, and the users can make the decision, and based on the available security setup on their environment, they can make a decision of enabling or disabling this behavior. What do you think I don't think most users understand the significance of the security issue here. Symlink attacks can be pretty darn subtle, and I think a well-intentioned administrator will end up opening a really bad hole just to get around the issue. Instead, we should figure out a secure way to solve the original use case. (which I'm still confused about, per above, because the description mentions the TaskLog servlet).
          Hide
          Todd Lipcon added a comment -

          Also, the description here talks about the Task Log servlet, but the patch itself changes the /logs servlet, which isn't used for fetching task logs.

          It may be that this isn't problematic since the /logs servlet is already restricted to only admin-authorized users, but it still makes me somewhat nervous. So we should discuss.

          Show
          Todd Lipcon added a comment - Also, the description here talks about the Task Log servlet, but the patch itself changes the /logs servlet, which isn't used for fetching task logs. It may be that this isn't problematic since the /logs servlet is already restricted to only admin-authorized users, but it still makes me somewhat nervous. So we should discuss.
          Hide
          Ahmed Radwan added a comment -

          Thanks Todd, The patch adds a configuration property to control this behavior. So serving aliases can be disabled if needed (I mentioned that in my previous comment when discussing the security reasons and why we exposed this conf property). So we are basically here exposing the controls that Jetty already provides, and the users can make the decision, and based on the available security setup on their environment, they can make a decision of enabling or disabling this behavior. What do you think?

          Show
          Ahmed Radwan added a comment - Thanks Todd, The patch adds a configuration property to control this behavior. So serving aliases can be disabled if needed (I mentioned that in my previous comment when discussing the security reasons and why we exposed this conf property). So we are basically here exposing the controls that Jetty already provides, and the users can make the decision, and based on the available security setup on their environment, they can make a decision of enabling or disabling this behavior. What do you think?
          Hide
          Todd Lipcon added a comment -

          I'm concerned this might be a security hole. Are there any user-writable places that are served by jetty's file servlet? I think the user might be able to do something like drop a symlink into their log directory which points to the TT's keytab or another user's tokens. Without using the SecureIOUtils, this attack seems likely. I think we should revert this until we can do a detailed review for security.

          Show
          Todd Lipcon added a comment - I'm concerned this might be a security hole. Are there any user-writable places that are served by jetty's file servlet? I think the user might be able to do something like drop a symlink into their log directory which points to the TT's keytab or another user's tokens. Without using the SecureIOUtils, this attack seems likely. I think we should revert this until we can do a detailed review for security.
          Hide
          Alejandro Abdelnur added a comment -

          Thanks Ahmed. Committed to trunk, branch-1 and branch-2.

          Show
          Alejandro Abdelnur added a comment - Thanks Ahmed. Committed to trunk, branch-1 and branch-2.
          Hide
          Mayank Bansal added a comment -

          Yeah Ahmed make sense.
          +1

          Thanks,
          Mayank

          Show
          Mayank Bansal added a comment - Yeah Ahmed make sense. +1 Thanks, Mayank
          Hide
          Alejandro Abdelnur added a comment -

          +1 for trunk as well. I think we should leave an ON/OFF switch for the reasons Ahmed mentioned.

          Show
          Alejandro Abdelnur added a comment - +1 for trunk as well. I think we should leave an ON/OFF switch for the reasons Ahmed mentioned.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12542159/MAPREDUCE-4572_trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javac. The applied patch generated 2059 javac compiler warnings (more than the trunk's current 2058 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2766//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2766//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2766//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542159/MAPREDUCE-4572_trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac. The applied patch generated 2059 javac compiler warnings (more than the trunk's current 2058 warnings). +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2766//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2766//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2766//console This message is automatically generated.
          Hide
          Ahmed Radwan added a comment -

          Thanks Mayank for the review! I thought about that too. I think it is useful to have the option to disable it in case - for security reasons - the user wants to disable serving aliases. Especially, when it is disabled, although the the task logs are no longer available via logs/userlogs, but they can be available from the job history server. After all, the ability to configure and disable this property is not harmful, but adds more flexibility. What do you think?

          Show
          Ahmed Radwan added a comment - Thanks Mayank for the review! I thought about that too. I think it is useful to have the option to disable it in case - for security reasons - the user wants to disable serving aliases. Especially, when it is disabled, although the the task logs are no longer available via logs/userlogs, but they can be available from the job history server. After all, the ability to configure and disable this property is not harmful, but adds more flexibility. What do you think?
          Hide
          Mayank Bansal added a comment -

          Hi Ahmed,

          The only comment I have is, do we really need configuration to enable/disable it?

          Why we cant just make it enable all the time? As we already have so much configuration.

          Rest, patch looks good.

          Thanks,
          Mayank

          Show
          Mayank Bansal added a comment - Hi Ahmed, The only comment I have is, do we really need configuration to enable/disable it? Why we cant just make it enable all the time? As we already have so much configuration. Rest, patch looks good. Thanks, Mayank
          Hide
          Ahmed Radwan added a comment -

          Thanks Tucu for the review! I wanted to incorporate any comments on the original patch before preparing a corresponding one for trunk. Here is the trunk version.

          Show
          Ahmed Radwan added a comment - Thanks Tucu for the review! I wanted to incorporate any comments on the original patch before preparing a corresponding one for trunk. Here is the trunk version.
          Hide
          Alejandro Abdelnur added a comment -

          +1. I assume we need something similar for Hadoop 2, no?

          Show
          Alejandro Abdelnur added a comment - +1. I assume we need something similar for Hadoop 2, no?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541849/MAPREDUCE-4572.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2756//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541849/MAPREDUCE-4572.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2756//console This message is automatically generated.
          Hide
          Ahmed Radwan added a comment -

          Here is a patch that solves this issue. The alias serving behavior is now controlled via a new configuration property. I have manually tested the patch on a single-node cluster and confirmed that aliases are now served if the property is set to "true" and are not served when it is set to "false". Default is "true".

          Show
          Ahmed Radwan added a comment - Here is a patch that solves this issue. The alias serving behavior is now controlled via a new configuration property. I have manually tested the patch on a single-node cluster and confirmed that aliases are now served if the property is set to "true" and are not served when it is set to "false". Default is "true".
          Hide
          Ahmed Radwan added a comment -

          Trying to access the logs will get:

          HTTP ERROR 404
          
          Problem accessing /logs/userlogs/job_201208211508_0001/attempt_201208211508_0001_m_000000_0/. Reason:
          
              NOT_FOUND
          
          Show
          Ahmed Radwan added a comment - Trying to access the logs will get: HTTP ERROR 404 Problem accessing /logs/userlogs/job_201208211508_0001/attempt_201208211508_0001_m_000000_0/. Reason: NOT_FOUND

            People

            • Assignee:
              Ahmed Radwan
              Reporter:
              Ahmed Radwan
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development