Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-945

Make NameNode resilient to DoS attacks (malicious or otherwise)

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.

      I'd like to start a discussion around how we prevent such, and possibly malicious applications in the future, taking down the NameNode.

      Thoughts?

        Issue Links

          Activity

          Hide
          aw Allen Wittenauer added a comment -

          I'm going to dupe this to HADOOP-9194.

          Show
          aw Allen Wittenauer added a comment - I'm going to dupe this to HADOOP-9194 .
          Hide
          aw Allen Wittenauer added a comment -

          QoS (which is really what we're talking about here) is better done at the application layer, IMO. Passing this work off to an already overworked iptables (which is providing security since hadoop doesn't have much of any) is an idea that won't scale, esp at Yahoo! levels.

          Show
          aw Allen Wittenauer added a comment - QoS (which is really what we're talking about here) is better done at the application layer, IMO. Passing this work off to an already overworked iptables (which is providing security since hadoop doesn't have much of any) is an idea that won't scale, esp at Yahoo! levels.
          Hide
          cos Konstantin Boudnik added a comment -

          It might also make sense to focus on detection of such attacks and counter acts with, say, iptables filtering to cut off an intruder or an honest fool.

          Show
          cos Konstantin Boudnik added a comment - It might also make sense to focus on detection of such attacks and counter acts with, say, iptables filtering to cut off an intruder or an honest fool.
          Hide
          sharadag Sharad Agarwal added a comment -

          Some of the things like rate limiting is applicable to mapreduce as well, so I assume to have it in the RPC layer, though there may be use cases to have different limits for different operations.

          Show
          sharadag Sharad Agarwal added a comment - Some of the things like rate limiting is applicable to mapreduce as well, so I assume to have it in the RPC layer, though there may be use cases to have different limits for different operations.
          Hide
          eli Eli Collins added a comment -

          Let's keep this jira to mechanism, can discuss policy (eg defaults) later. If we really want to prevent DoS then HDFS needs some notion of QOS reservations (and even then you can just use more clients with different IPs etc). I suspect for now it's best to try to limit the impact of any one operation (eg lsr /), and then rate limit operations by client. Agree with Todd about focusing on the non-malicious use case first.

          Show
          eli Eli Collins added a comment - Let's keep this jira to mechanism, can discuss policy (eg defaults) later. If we really want to prevent DoS then HDFS needs some notion of QOS reservations (and even then you can just use more clients with different IPs etc). I suspect for now it's best to try to limit the impact of any one operation (eg lsr /), and then rate limit operations by client. Agree with Todd about focusing on the non-malicious use case first.
          Hide
          tlipcon Todd Lipcon added a comment -

          apps within trusted network that does not need to be paranoid about this

          I disagree. It's very easy to write a well-meaning app that does awful things. Certainly the limits should be configurable, but they should default on and be set at a high enough threshold that they only trigger in a case where it would make the NN fall over otherwise.

          Show
          tlipcon Todd Lipcon added a comment - apps within trusted network that does not need to be paranoid about this I disagree. It's very easy to write a well-meaning app that does awful things. Certainly the limits should be configurable, but they should default on and be set at a high enough threshold that they only trigger in a case where it would make the NN fall over otherwise.
          Hide
          kaykay.unique Karthik K added a comment -

          Echoing Zlatin's comment - this should be an optional feature , for those apps within trusted network that does not need to be paranoid about this.

          Show
          kaykay.unique Karthik K added a comment - Echoing Zlatin's comment - this should be an optional feature , for those apps within trusted network that does not need to be paranoid about this.
          Hide
          zlatinb Zlatin Balevsky added a comment -

          Any type of rate-limiting should be either optional or configurable on per-application basis.

          Show
          zlatinb Zlatin Balevsky added a comment - Any type of rate-limiting should be either optional or configurable on per-application basis.
          Hide
          tlipcon Todd Lipcon added a comment -

          What's the scope of this? It seems there are a number of DoS scenarios to worry about:

          • RPC flooding (as you noted above)
          • Malformed packets (it's probably not too hard to find a spot where you can make the NN allocate way too much memory and crash some important thread)
          • Open socket limit exhaustion - what if a client just opened thousands of connections to the NN's RPC ports without actually sending commands? At some point you'll hit the ulimit -n
          • lots of others

          I imagine some of these are high priority and others less so. Focusing on non-malicious clients first is probably easiest. Although bugs can make non-malicious clients act like malicious ones for sure, I think your point is good that we should focus on well-meaning but stupid applications first

          Show
          tlipcon Todd Lipcon added a comment - What's the scope of this? It seems there are a number of DoS scenarios to worry about: RPC flooding (as you noted above) Malformed packets (it's probably not too hard to find a spot where you can make the NN allocate way too much memory and crash some important thread) Open socket limit exhaustion - what if a client just opened thousands of connections to the NN's RPC ports without actually sending commands? At some point you'll hit the ulimit -n lots of others I imagine some of these are high priority and others less so. Focusing on non-malicious clients first is probably easiest. Although bugs can make non-malicious clients act like malicious ones for sure, I think your point is good that we should focus on well-meaning but stupid applications first
          Hide
          philip Philip Zeyliger added a comment -

          A solution is rate limiting (by client IP or user id or something else). It ain't fool-proof, but it would probably get the job done.

          Show
          philip Philip Zeyliger added a comment - A solution is rate limiting (by client IP or user id or something else). It ain't fool-proof, but it would probably get the job done.

            People

            • Assignee:
              Unassigned
              Reporter:
              acmurthy Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development