Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3231

Improve Application Master And Job History UI Security

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None

      Description

      I propose a stripped down JSON based protocol for creating safe user generate web pages. This JIRA is intended first of all as a place for a discussion about this proposal, and then if there are no serious objections this will be an Umbrella JIRA to implement the changes proposed.

      1. AMWebSecurityProposal.pdf
        41 kB
        Robert Joseph Evans

        Issue Links

          Activity

          Hide
          Robert Joseph Evans added a comment -

          The attached proposal is for what I would like to see happen to the AM UI, web proxy, and job history server. I would like any feedback on this proposal that the community has.

          Show
          Robert Joseph Evans added a comment - The attached proposal is for what I would like to see happen to the AM UI, web proxy, and job history server. I would like any feedback on this proposal that the community has.
          Hide
          Arun C Murthy added a comment -

          I think this is going to entail a long discussion... I doubt we can make progress on this in the near term.

          Show
          Arun C Murthy added a comment - I think this is going to entail a long discussion... I doubt we can make progress on this in the near term.
          Hide
          Robert Joseph Evans added a comment -

          I agree that this is going to require quite a bit of discussion. I think the proxy solution is good enough for the short term, but I don't think it is a good long term solution. I would rather get the correct solution in place even if it takes a while to do it.

          Show
          Robert Joseph Evans added a comment - I agree that this is going to require quite a bit of discussion. I think the proxy solution is good enough for the short term, but I don't think it is a good long term solution. I would rather get the correct solution in place even if it takes a while to do it.
          Hide
          Luke Lu added a comment -

          If I understand your proposal correctly, you're trying to invent a less powerful but "more secure" alternative language to html/js/css for a trusted web server (essentially a proxy) to assemble html/js/css for end users. Besides the complexity of the approach (e.g., you'll have to at least invent a robust stream based json parser that can handle adversarial long name and values, which doesn't exist yet (with a compatible open source license anyway), which you seem to underestimate, it's a non-starter for deployments that do not require such security and/or have a commercial transparent proxy that can handle the webapp security just fine. A fundamental requirement for hadoop security is that it must be optional and pluggable. Your proposal requires people to rewrite their webapps in your extremely restrictive way. It's fundamentally wrong on so many levels. The web proxy design (in MAPREDUCE-2858) in conjunction with code whitelisting can give user complete freedom in AM UI design, while adequately ensure security when it's needed.

          I'm strongly -1 on any proposal that impose mandatory significant restriction on people's freedom to create their own web UI in the cloud/cluster/grid.

          Show
          Luke Lu added a comment - If I understand your proposal correctly, you're trying to invent a less powerful but "more secure" alternative language to html/js/css for a trusted web server (essentially a proxy) to assemble html/js/css for end users. Besides the complexity of the approach (e.g., you'll have to at least invent a robust stream based json parser that can handle adversarial long name and values, which doesn't exist yet (with a compatible open source license anyway), which you seem to underestimate, it's a non-starter for deployments that do not require such security and/or have a commercial transparent proxy that can handle the webapp security just fine. A fundamental requirement for hadoop security is that it must be optional and pluggable. Your proposal requires people to rewrite their webapps in your extremely restrictive way. It's fundamentally wrong on so many levels. The web proxy design (in MAPREDUCE-2858 ) in conjunction with code whitelisting can give user complete freedom in AM UI design, while adequately ensure security when it's needed. I'm strongly -1 on any proposal that impose mandatory significant restriction on people's freedom to create their own web UI in the cloud/cluster/grid.
          Hide
          Robert Joseph Evans added a comment -

          @Luke

          I understand that you want a great deal of flexibility in how a web page is created, and I agree that we do not want to tie peoples' hands in saying that there is no way that they can create a new UI. But security by its nature is restrictive. Security is saying I'm sorry but you are trying to do something that is either unsafe, unwise, or beyond what you have been granted the power to do. I do not believe that this proposal will "impose mandatory significant restriction on people's freedom to create their own web UI". If you want to create a brand new UI in every way you can.

          {
            "template": ["org.apache.hadoop.luke.lu.LukesWonderfulNewWidget"],
            "data" : ...
          }
          

          The difference is that if you want to do it on a cluster that is managed by me, I have to give you approval to do that first, and install org.apache.hadoop.luke.lu.LukesWonderfulNewWidget on the proxy along with its dependencies. Also if I find out in the future that org.apache.hadoop.BobbysMaliciousWidget is doing something bad, that somehow was missed, I as the owner of this cluster can remove it from the classpath until we can either fix BobbysMaliciousWidget or determine what other actions need to be taken. Or if a critical vulnerability is found in jquery-datatable, I can upgrade jquery-datatable on the proxies that use it, and know that it is safe. I really don't know off the top of my head what steps would have to be taken if we are using JS code signing, beyond revoking all entries in the white list.

          I would also like to add in that warning someone about a potentially malicious page is not secure at all. People make bad choices, especially when it is with clicking on something. I don't think that >90% of page views will be to pages served by an app master that is owned by that user. At least where I work an entire group of people own a set of map/reduce processes. Those processes are run as a user that represents those processes, not as any one user in that group. If they have to click through a warning every time they want to see a page, it will become an automatic reaction, and they will not think about it at all. I think the JS signing should handle most of this, but not all of it, especially if they modify the UI for something they specifically want.

          Yes, if someone does turn off security they still have to go through the process of installing new widgets on the proxy. But this also solves the problem of displaying History Server pages in a non MR specific manor. I have not seen any proposal so far explaining how that will happen. I saw a JIRA that we should do it, but no proposal about how to do it.

          Hadoop security has the design goals to be pluggable and optional. That is great but it does not mean that the Map Reduce Application Master does not have to do anything to be secure. It has to jump through all kinds of hoops so that it can be secure and be something that should be run on a secure cluster.

          The code is not going to be any more complex then Hamlet is. In fact I see Hamlet being the main tool to handle all of that complexity, the difference is that instead of always writing out HTML, we could also write out JSON, or read back in the JSON. It will only take a few minor tweaks to Hamlet to allow it to the streaming.

          As for Streaming limits in JSON, that is a good point, but JSON is a relatively simple protocol that is used extensively, I do not see any difficulty in adding in something like this to Jackson, if it is not there already. Simply do a check while reading in a string and if we have read more data into this string then is the limit throw an exception. Seems very straight forward. I know that everyone else is doing it is not a good logical argument but if it is such a concern then we need to go rip JSON support out of webhdfs, because it uses Jackson and is vulnerable to a malicious client. I would also like to see how ProtocolBuffers and Avro handle this. I could not find any reference to size limits on the web. In fact I saw someone saying that they have been able to send 45MB strings using ProtocolBuffers, so if there are limitations then we probably need to set them up, because I don't think they are enabled by default.

          Also parsing out all Javascript from inside all HTML pages, including any CSS that may have it in there as browser specific extensions, seems rather complex to me. It also seems like a bit of an arms race to stay on top of it all as the web standards are constantly evolving and browser makers are adding in their own extensions. There are flash and java applet vulnerabilities that constantly pop up, that are not included in proposed code signing at all, and what about the native code execution extensions that Chrome is adding in? HTML is just way too complex for me to feel confident that that a proxy is not exposing a vulnerability, especially when it is trying to protect against something that browser builders do not consider to be a vulnerability, like reading a cookie from within a web page that has the right to read that cookie.

          Yes, this proposal is more restrictive then allowing raw HTML/CSS/JS etc. That is because it is more secure then allowing it. Yes, it does provide a higher barrier for entry for someone who wants to write a new AppMaster and always run it on an insecure cluster, but having a GUI is not a requirement for an AppMaster, and it is no where as complex as is writing the rest of a basic AppMaster. I would like to hear what else is "fundamentally wrong on so many levels" so that I can address them as well.

          Show
          Robert Joseph Evans added a comment - @Luke I understand that you want a great deal of flexibility in how a web page is created, and I agree that we do not want to tie peoples' hands in saying that there is no way that they can create a new UI. But security by its nature is restrictive. Security is saying I'm sorry but you are trying to do something that is either unsafe, unwise, or beyond what you have been granted the power to do. I do not believe that this proposal will "impose mandatory significant restriction on people's freedom to create their own web UI". If you want to create a brand new UI in every way you can. { "template" : [ "org.apache.hadoop.luke.lu.LukesWonderfulNewWidget" ], "data" : ... } The difference is that if you want to do it on a cluster that is managed by me, I have to give you approval to do that first, and install org.apache.hadoop.luke.lu.LukesWonderfulNewWidget on the proxy along with its dependencies. Also if I find out in the future that org.apache.hadoop.BobbysMaliciousWidget is doing something bad, that somehow was missed, I as the owner of this cluster can remove it from the classpath until we can either fix BobbysMaliciousWidget or determine what other actions need to be taken. Or if a critical vulnerability is found in jquery-datatable, I can upgrade jquery-datatable on the proxies that use it, and know that it is safe. I really don't know off the top of my head what steps would have to be taken if we are using JS code signing, beyond revoking all entries in the white list. I would also like to add in that warning someone about a potentially malicious page is not secure at all. People make bad choices, especially when it is with clicking on something. I don't think that >90% of page views will be to pages served by an app master that is owned by that user. At least where I work an entire group of people own a set of map/reduce processes. Those processes are run as a user that represents those processes, not as any one user in that group. If they have to click through a warning every time they want to see a page, it will become an automatic reaction, and they will not think about it at all. I think the JS signing should handle most of this, but not all of it, especially if they modify the UI for something they specifically want. Yes, if someone does turn off security they still have to go through the process of installing new widgets on the proxy. But this also solves the problem of displaying History Server pages in a non MR specific manor. I have not seen any proposal so far explaining how that will happen. I saw a JIRA that we should do it, but no proposal about how to do it. Hadoop security has the design goals to be pluggable and optional. That is great but it does not mean that the Map Reduce Application Master does not have to do anything to be secure. It has to jump through all kinds of hoops so that it can be secure and be something that should be run on a secure cluster. The code is not going to be any more complex then Hamlet is. In fact I see Hamlet being the main tool to handle all of that complexity, the difference is that instead of always writing out HTML, we could also write out JSON, or read back in the JSON. It will only take a few minor tweaks to Hamlet to allow it to the streaming. As for Streaming limits in JSON, that is a good point, but JSON is a relatively simple protocol that is used extensively, I do not see any difficulty in adding in something like this to Jackson, if it is not there already. Simply do a check while reading in a string and if we have read more data into this string then is the limit throw an exception. Seems very straight forward. I know that everyone else is doing it is not a good logical argument but if it is such a concern then we need to go rip JSON support out of webhdfs, because it uses Jackson and is vulnerable to a malicious client. I would also like to see how ProtocolBuffers and Avro handle this. I could not find any reference to size limits on the web. In fact I saw someone saying that they have been able to send 45MB strings using ProtocolBuffers, so if there are limitations then we probably need to set them up, because I don't think they are enabled by default. Also parsing out all Javascript from inside all HTML pages, including any CSS that may have it in there as browser specific extensions, seems rather complex to me. It also seems like a bit of an arms race to stay on top of it all as the web standards are constantly evolving and browser makers are adding in their own extensions. There are flash and java applet vulnerabilities that constantly pop up, that are not included in proposed code signing at all, and what about the native code execution extensions that Chrome is adding in? HTML is just way too complex for me to feel confident that that a proxy is not exposing a vulnerability, especially when it is trying to protect against something that browser builders do not consider to be a vulnerability, like reading a cookie from within a web page that has the right to read that cookie. Yes, this proposal is more restrictive then allowing raw HTML/CSS/JS etc. That is because it is more secure then allowing it. Yes, it does provide a higher barrier for entry for someone who wants to write a new AppMaster and always run it on an insecure cluster, but having a GUI is not a requirement for an AppMaster, and it is no where as complex as is writing the rest of a basic AppMaster. I would like to hear what else is "fundamentally wrong on so many levels" so that I can address them as well.
          Hide
          Luke Lu added a comment -

          I would like to hear what else is "fundamentally wrong on so many levels" so that I can address them as well.

          1. First the cluster is not own by you. It's created by companies for better resource utilization with the goal of saving people's time in general. The most common use-case of AM UI is for users themselves. "Jump through all kinds of hoops" is a waste of time, especially when it's not necessary secure (see below). While I appreciate your appreciation of the Hamlet abstraction, users should be able to use their favorite language/framework for their AM UI, especially when porting from existing apps.
          2. Inventing a new security scheme is almost always a bad idea, even for security experts. Having a trusted front-end with a special interpreter for your special scheme is a recipe for disaster. Writing secure and trusted webapp is hard even for experts. People are still finding security bugs in facebook and google years after they were created.
          3. Handling of raw HTML/CSS/JS is well studied by many in the industry (Caja, OWASP and ModSecurity etc.), there are both open source and commercial solutions to webapp security in general. We're merely take advantage of our special case to eliminate false positives for user themselves.
          Show
          Luke Lu added a comment - I would like to hear what else is "fundamentally wrong on so many levels" so that I can address them as well. First the cluster is not own by you. It's created by companies for better resource utilization with the goal of saving people's time in general. The most common use-case of AM UI is for users themselves. "Jump through all kinds of hoops" is a waste of time, especially when it's not necessary secure (see below). While I appreciate your appreciation of the Hamlet abstraction, users should be able to use their favorite language/framework for their AM UI, especially when porting from existing apps. Inventing a new security scheme is almost always a bad idea, even for security experts. Having a trusted front-end with a special interpreter for your special scheme is a recipe for disaster. Writing secure and trusted webapp is hard even for experts. People are still finding security bugs in facebook and google years after they were created. Handling of raw HTML/CSS/JS is well studied by many in the industry (Caja, OWASP and ModSecurity etc.), there are both open source and commercial solutions to webapp security in general. We're merely take advantage of our special case to eliminate false positives for user themselves.
          Hide
          Robert Joseph Evans added a comment -

          users should be able to use their favorite language/framework for their AM UI, especially when porting from existing apps

          That is a good point Porting a UI from existing applications would add in extra overhead. But does open MPI have an existing GUI? Does Giraph or pig or most of the other applications that are in the process of being ported have an existing GUI? About the only one that I can think of is Twitter Storm, and there has been no progress on that in quite a while, so I don't think it is that big of a deal.

          Handling of raw HTML/CSS/JS is well studied by many in the industry (Caja, OWASP and ModSecurity etc.)

          Didn't you say that you don't trust Caja. Why then didn't we go with a different library?

          Inventing a new security scheme is almost always a bad idea, even for security experts. Having a trusted front-end with a special interpreter for your special scheme is a recipe for disaster.

          So Wiki/Twiki are a bad idea? Because aren't they a trusted front-end with a special interpreter for a special scheme? Yes it is not all about security, but that is part of it because I would never go to Wikipedia if I thought I could easily get a virus from it.

          Writing secure and trusted webapp is hard even for experts. People are still finding security bugs in facebook and google years after they were created.

          Exactly so why do I want to let a user run code with security errors in it and remove the possibility for me as the administrator of a cluster to fix those errors in a timely manor. If you look at Pig with Oozie. Oozie requires that the pig jars be placed in HDFS in a special directory so that they can be part of the distributed cache for Oozie to run. Anyways from what I have seen in the real world is that people don't think too much about the version of pig that they put out there until there is a problem that makes their code not run. I have seen very very old version of pig that are no longer supported being run because there is no motivation to fix it.

          Show
          Robert Joseph Evans added a comment - users should be able to use their favorite language/framework for their AM UI, especially when porting from existing apps That is a good point Porting a UI from existing applications would add in extra overhead. But does open MPI have an existing GUI? Does Giraph or pig or most of the other applications that are in the process of being ported have an existing GUI? About the only one that I can think of is Twitter Storm, and there has been no progress on that in quite a while, so I don't think it is that big of a deal. Handling of raw HTML/CSS/JS is well studied by many in the industry (Caja, OWASP and ModSecurity etc.) Didn't you say that you don't trust Caja . Why then didn't we go with a different library? Inventing a new security scheme is almost always a bad idea, even for security experts. Having a trusted front-end with a special interpreter for your special scheme is a recipe for disaster. So Wiki/Twiki are a bad idea? Because aren't they a trusted front-end with a special interpreter for a special scheme? Yes it is not all about security, but that is part of it because I would never go to Wikipedia if I thought I could easily get a virus from it. Writing secure and trusted webapp is hard even for experts. People are still finding security bugs in facebook and google years after they were created. Exactly so why do I want to let a user run code with security errors in it and remove the possibility for me as the administrator of a cluster to fix those errors in a timely manor. If you look at Pig with Oozie. Oozie requires that the pig jars be placed in HDFS in a special directory so that they can be part of the distributed cache for Oozie to run. Anyways from what I have seen in the real world is that people don't think too much about the version of pig that they put out there until there is a problem that makes their code not run. I have seen very very old version of pig that are no longer supported being run because there is no motivation to fix it.
          Hide
          Luke Lu added a comment -

          so I don't think it is that big of a deal.

          Bobby, you're seriously underestimate the potential of MRv2. The number of enterprise applications (which means they have nice UIs) that can be ported to MR2 is orders of magnitude more than listed. People should worry more about using the container APIs than porting the UIs.

          Again, I'd like to call your attention to the fundamental wrongness of your approach as you fail/refuse to see it. The most common use case of AM UI is for application users themselves, which is more than 99% of the cluster users. You're trying to force people to adopt a special way, and in a sense, raising the taxes of the 99% for the sake of 1% (the self entitlement permeates throughout your arguments). Sorry to use a cliché but I was a 1% and I support the 99%

          So Wiki/Twiki are a bad idea? Because aren't they a trusted front-end with a special interpreter for a special scheme?

          I can't speak for other wikis, but mediawiki (the one that powers wikipedia) had/has so many security bugs that I lost count. Many of the bugs are, not surprisingly, in the template interpreter and related code.

          because I would never go to Wikipedia if I thought I could easily get a virus from it.

          This is a classic example of false sense of security through ignorance. You can get a virus from wikipedia, even if javascript is turned off, especially if you use a common combination of browsers/operating systems versions. I don't think you're not paranoid enough to write secure code yet

          Didn't you say that you don't trust Caja. Why then didn't we go with a different library?

          I don't trust Caja yet but its security track record is way better than mediawiki. The main issue of Caja is that it's currently too slow (multi-pass/stage compiler) for non-trivial pages and that it doesn't have an official release yet. The other solutions I mentioned are not surprisingly existing proxy solutions. I was actually thinking about creating a simple plugin interface to allow people to delegate to deployment chosen solution for common web security, when the request is not the owner of AM.

          I have seen very very old version of pig that are no longer supported being run because there is no motivation to fix it.

          I'm not sure if you're intentionally trying to conflate the issues here. I was talking about the difficulties of writing trusted webapps. The example you gave is neither and very much off topic. The main thrust of my original design is that a simple proxy can obviate the need to write complex (bug prone) trusted webapps in our particular use cases.

          OTOH, I do see your perspective as an on-call dev, as I've been there as well. Users are always trying to minimize their own work and avoiding upgrade if things are working for them. That's life and being a control freak doesn't really help here. You should focus your energy to fix the framework with reasonable security and limits and let users work within the limits.

          Show
          Luke Lu added a comment - so I don't think it is that big of a deal. Bobby, you're seriously underestimate the potential of MRv2. The number of enterprise applications (which means they have nice UIs) that can be ported to MR2 is orders of magnitude more than listed. People should worry more about using the container APIs than porting the UIs. Again, I'd like to call your attention to the fundamental wrongness of your approach as you fail/refuse to see it. The most common use case of AM UI is for application users themselves, which is more than 99% of the cluster users. You're trying to force people to adopt a special way, and in a sense, raising the taxes of the 99% for the sake of 1% (the self entitlement permeates throughout your arguments). Sorry to use a cliché but I was a 1% and I support the 99% So Wiki/Twiki are a bad idea? Because aren't they a trusted front-end with a special interpreter for a special scheme? I can't speak for other wikis, but mediawiki (the one that powers wikipedia) had/has so many security bugs that I lost count. Many of the bugs are, not surprisingly, in the template interpreter and related code. because I would never go to Wikipedia if I thought I could easily get a virus from it. This is a classic example of false sense of security through ignorance. You can get a virus from wikipedia, even if javascript is turned off, especially if you use a common combination of browsers/operating systems versions. I don't think you're not paranoid enough to write secure code yet Didn't you say that you don't trust Caja. Why then didn't we go with a different library? I don't trust Caja yet but its security track record is way better than mediawiki. The main issue of Caja is that it's currently too slow (multi-pass/stage compiler) for non-trivial pages and that it doesn't have an official release yet. The other solutions I mentioned are not surprisingly existing proxy solutions. I was actually thinking about creating a simple plugin interface to allow people to delegate to deployment chosen solution for common web security, when the request is not the owner of AM. I have seen very very old version of pig that are no longer supported being run because there is no motivation to fix it. I'm not sure if you're intentionally trying to conflate the issues here. I was talking about the difficulties of writing trusted webapps . The example you gave is neither and very much off topic. The main thrust of my original design is that a simple proxy can obviate the need to write complex (bug prone) trusted webapps in our particular use cases. OTOH, I do see your perspective as an on-call dev, as I've been there as well. Users are always trying to minimize their own work and avoiding upgrade if things are working for them. That's life and being a control freak doesn't really help here. You should focus your energy to fix the framework with reasonable security and limits and let users work within the limits.
          Hide
          Robert Joseph Evans added a comment -

          @Luke

          I obviously haven't convinced you. I have assigned the JIRA to you so you can do with it as you see fit. Feel free to close it or use it to make the changes that you wanted on it.

          Show
          Robert Joseph Evans added a comment - @Luke I obviously haven't convinced you. I have assigned the JIRA to you so you can do with it as you see fit. Feel free to close it or use it to make the changes that you wanted on it.

            People

            • Assignee:
              Luke Lu
              Reporter:
              Robert Joseph Evans
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development