Sorry to interrupt again, but Jan is clearly still that one that really understand what this is and is not about. It is a shame.
In this discussion "security" seems to be just one thing. People talk about aspects of security that this issue has nothing to do with AT ALL.
Security is about a lot of aspects - a.o.:
- Authentication: Allowing the client to identify itself as being someone or something. And do it in a way so that you (the server-side) trust him. Examples if someone/something:
- a) The one(s) knowing a specific set of username/password
- b) The one(s) holding the private part of a certificate-pair (e.g. RSA)
- c) The one(s) able to requests for a machine with a certain IP-address
- Authorization: Basically a map from the "key" associated with your authentication to a set of things you are allowed to do. "Things you are allowed to do" can e.g. be "functions/operations you are allowed to do" or more fine-grained "data you are allowed to read or update or delete". For a) the "key" will be the username. For b) the "key" will be the public part of the certificate-pair (which you know in advance) and for c) the "key" will be the IP-address
- Integrity: E.g. on transport-layer that data has not been changed on the way from when the client (the authenticated party) sent it and until the server receives it
- Confidentiality: E.g. on transport-layer that data has not been read (and understood) on the way from when the client sent it and until the server receives it
- Dealing with the aspects on storage-, application-, transport-, OS-, etc-levels
Different kinds of technology claim to be able to guarantee different sets of those aspects. E.g. a webcontainer is required to be able to deal with authentication and authorization on application-layer and integrity and confidentiality on transport-layer - at least if it wants to be certified, because it is part of the spec that it needs to implement. The fact that a webcontainer is able to deal with those things for you is something people would like to advantage of - whether or not you guys are willing to accept it or not. At my company we want to do it, and as I understand it, Jan knows about at least two other others that also want to do it. I would claim that at least 90% of Solr users that want "to do security" would like to take advantage of the fact that a webcontainer can handle those things for you.
Providing security features is not just something you do, like adding any other feature. You need to have people with a real security background who know what the fuck they are doing to ensure correctness. You need to deal with the inevitable security vulnerabilities and fixes to those. I don't think this is something our PMC should waste its time with.
I agree. But this is NOT about Solr going into "security" in the way that "we handle/guarantee this and that kind of security aspect for you". That is still left to other technologies like e.g. the webcontainer. This issue is all about enabling a SolrCloud cluster to work, IF you (a Solr user) choose to have another technology enforce certain security aspects for you. If a Solr users sets up any kind of security technology that require ingoing traffic to a Solr-node to be authenticated (by http basic auth) an this is also enforced for Solr-node to Solr-node traffic the SolrCloud cluster will not work, and you cannot (easily) make it work in a secure way without Solr changes. If you use the webcontainer (running Solr) to enforce those security aspects it will be enforced for Solr-node to Solr-node traffic also.
So when issues come up about security-related things in lucene/solr that can be done elsewhere outside of it in a more secure way instead (e.g. encrypting directories and so on), you can expect me to push back too.
You might be able to do it "elsewhere" but it will certainly not be "more secure".
Encrypting directories is something completely different. It deals with confidentiality aspects on storage-level. This issue has absolutely nothings to do with that.
we could recommend ipsec instead for example
IPsec is mainly about integrity and confidentiality on transport-level (IP-layer). That is a valid alternative to letting the webcontainer deal with integrity and confidentiality on transport-level (basically require HTTPs transport). Using IPsec for authentication and authorization is very complicated, and unless you really want to do major work, it is only able to deal with those aspects based on certificates. People want to use usernames and passwords in a lot of use-cases. You do not see facebook or twitter or ... wanting you to generate you own RSA-certificate-pairs and send them the public part of it. I know Solr has the philosophy that is it not supposed to be exposed directly - instead be exposed indirectly though some kind of gateway (where authentication and authorization wrt "outer users" can be enforced). But if you are fairly paranoid you do not necessarily want to trust those gateways (they might do bad things both intentionally and unintentionally), and therefore you will also likely want to set up security around you SolrCloud cluster itself. Activating the webcontainers (the ones running Solr) ability to do it for you is just an obvious way.
That's why I don't understand how someone would use "forwarding credentials" feature if Solr does not provide any way (best practices, recipes, whatever) to enforce authz policies / security. How do you do that in your application? How do you specify who can do what? Where do you enforce that - in custom UpdateProcessor, SearchComponent, SolrDispathFilter?
We use the webcontainers ability to enforce those aspects of security. For recipes I have added a lot to http://wiki.apache.org/solr/SolrSecurity - go read.
To spell it out we do the following
- Add to Solr web.xml AT THE VERY TOP
- Add to Solr web.xml (at the spot where it belongs)
<web-resource-name>All resources need authentication</web-resource-name>
- Add to jetty.xml
<Set name="name">My Realm</Set>
This basically asks jetty to handle authentication and authorization for you - AT application-layer. See details on http://wiki.apache.org/solr/SolrSecurity about how it works and why it is done the way it is
- Actually we only let the webcontainer deal with the authentication part. We want to do authorization based on URL-patterns, which a webcontianer is able to do. But due to limitations on the <url-pattern> and the way Solr-URLs are structured and our requirements to URL-based authorization (basically we want a "search-user" allowed to do searches only and an "admin-user" allowed to do anything), we need to deal with authorization in another way. We deal with URL-based authorization by adding the RegExpAuthorizationFilter filter in web.xml. It does URL-pattern authentication, just as the webcontainer itself is able to do for you, but this solution allowed reg-exp URL-patterns, enabling us to enforce the rules we want.
- We use our own Realm, but you can use one of those that come out of the box with jetty - see http://wiki.apache.org/solr/SolrSecurity. Our realm is using data that we put in ZooKeeper. ZooKeeper has some properties that makes it a nice persistence layer for a realm. It is distributed (unlike local files) so it is easy to make sure that all Solr-nodes at any time authenticate and authorize with the same set of credentials and roles. It also has this nice push-thing (watchers) enabling us to do changes in the realm-data-foundation (in ZK) and have all realms (living in the webcontainers) be aware of the changes without having to go pull for changes all the time.
supporting basic auth and https primarily at the container level is much less contraversial
Agree. That is exactly why you want to enable a SolrCloud cluster to still work, if the Solr admin chooses to let the container enforce that kind of security
SolrCloud is still at an early enough phase that I'm not really willing to spend a lot of time considering security as I add new features or refactor older code. Nor do I want to be on the line when some big company has a security breach due to my code changes.
You will not have to deal with the big company. If the enforcement of security does not work, it is because the technology they use to enforce it does not work. Solr is not enforcing security - the webcontainer or something else is. This patch only introduces the ability in SolrCloud, that you can make it work if the Solr admin choose to let the container handle security for you.
can you setup some basic auth at the container level
No basically not. Not before this patch.
...and run most things over https?
No, but that is another issue with Solr - or at least it was the last time I checked it. IPsec is a valid alternative, though.
I think ssl stuff should be working after recent http client upgrade and switch to SystemDefaultHttpClient. Now I believe you can set up your key and trust stores using standard Java properties and it should work.
Well you are kind of right, even though you mix up concepts a little. SSL (or HTTPS = HTTP over SSL) is about the transport-layer - nothing to do with this issue, SystemDefaultHttpClient or key- and trust-stores. But SSL uses certificate-pairs to do the encryption over the transport-layer. Those same certificate-pairs can be used for authentication, but it is another aspect and has nothing to do with SSL. But there is one big difference in how easy it is to use certificates for encrypted transport vs using it for authentication. To use it for authentication you need to pre-exchange the public parts of you certificates. To use it for encryption on the transport-layer you do not have to pre-exchange.
SystemDefaultHttpClient enables us to do certificated based authentication, yes. But it requires setting up key- and trust-stores and pre-exchanging certificates. People want to use username/password based authentication in a lot of user-cases.
Correct me if I'm wrong, but all handling of security on inbound requests to Solr is still handled fully by the container, even with this patch. I.e. no code that you add to SolrCloud will be able to open a hole for accepting incoming search requests that should not have been accepted. The user configures the realms, user/pass etc fully on the container level.
With one exception, and that is the -DinternalAuthCredentialsBasicAuthPassword=<password> passed to Solr code, enabling system-initiated inter-node communication. If this is snapped up by foreigners, they potentially gain full access to Solr if they have physical network access. We should find a better way than passing this on the command-line
Agree with you concern. The VM-param way of handing over the passwords to Solr is the easiest way, though. I wanted to limit the patch, so that is what is only directly supported for now. But the solution actually does enable you to do it a different way. You can override the CredentialsProviders or you can choose to use the default one (finding credentials in VM params) but not set the VM params using command-line. We do the later in my company - basically we pipe credentials in though stdin, have a small bean reading from stdin when the container starts and add whatever it reads as VM params. Voila, passwords not to be found in environment or on command-line (exposed by e.g. "ps -ef" or "ps eww").