Solr
  1. Solr
  2. SOLR-4470

Support for basic http auth in internal solr requests

    Details

      Description

      We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node.

      It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make "internal" request to other Solr-nodes, and for it to work credentials need to be provided here also.

      Ideally we would like to "forward" credentials from a particular request to all the "internal" sub-requests it triggers. E.g. for search and update request.

      But there are also "internal" requests

      • that only indirectly/asynchronously triggered from "outside" requests (e.g. shard creation/deletion/etc based on calls to the "Collection API")
      • that do not in any way have relation to an "outside" "super"-request (e.g. replica synching stuff)

      We would like to aim at a solution where "original" credentials are "forwarded" when a request directly/synchronously trigger a subrequest, and fallback to a configured "internal credentials" for the asynchronous/non-rooted requests.

      In our solution we would aim at only supporting basic http auth, but we would like to make a "framework" around it, so that not to much refactoring is needed if you later want to make support for other kinds of auth (e.g. digest)

      We will work at a solution but create this JIRA issue early in order to get input/comments from the community as early as possible.

      1. SOLR-4470_branch_4x_r1452629.patch
        274 kB
        Per Steffensen
      2. SOLR-4470_branch_4x_r1452629.patch
        273 kB
        Per Steffensen
      3. SOLR-4470_branch_4x_r1454444.patch
        307 kB
        Per Steffensen
      4. SOLR-4470_trunk_r1568857.patch
        271 kB
        Per Steffensen
      5. SOLR-4470.patch
        278 kB
        Jan Høydahl
      6. SOLR-4470.patch
        277 kB
        Jan Høydahl
      7. SOLR-4470.patch
        277 kB
        Jan Høydahl
      8. SOLR-4470.patch
        277 kB
        Jan Høydahl
      9. SOLR-4470.patch
        276 kB
        Jan Høydahl
      10. SOLR-4470.patch
        276 kB
        Jan Høydahl
      11. SOLR-4470.patch
        276 kB
        Jan Høydahl
      12. SOLR-4470.patch
        276 kB
        Jan Høydahl
      13. SOLR-4470.patch
        333 kB
        Jan Høydahl
      14. SOLR-4470.patch
        333 kB
        Jan Høydahl
      15. SOLR-4470.patch
        356 kB
        Jan Høydahl
      16. SOLR-4470.patch
        276 kB
        Jan Høydahl

        Issue Links

          Activity

          Hide
          Per Steffensen added a comment -

          Related solr-mailing-list discussion "Forwarding authentication credentials in internal node-to-node requests" from around 11-12/1-2013

          Show
          Per Steffensen added a comment - Related solr-mailing-list discussion "Forwarding authentication credentials in internal node-to-node requests" from around 11-12/1-2013
          Hide
          Per Steffensen added a comment - - edited

          First idea was to put "internal credentials" in solrconfig.xml, but solrconfig.xml is part of a configuration of one or more replica/shards/collections, and the "internal credentials" really should not be "per shard" but "per node" or "per cluster". Guess "per node" could be implemented as either "put in solr.xml" or "give as VM-params", but I dont know if there are future plans on getting rid of solr.xml. "Per cluster" could be implemented by "put in ZK (at a global location)". This will of course raise another issue of protecting access to solr.xml or ZK, but that is already relevant as it is today. We have an upcoming US focusing on protecting stuff in ZK, so we plan to deal with that part. Files can be protected in misc ways based on the OS.

          Opinions from the community please:

          • solr.xml?
          • "global" location in ZK?
          • or, do you think this is a solrconfig.xml thingy?
          Show
          Per Steffensen added a comment - - edited First idea was to put "internal credentials" in solrconfig.xml, but solrconfig.xml is part of a configuration of one or more replica/shards/collections, and the "internal credentials" really should not be "per shard" but "per node" or "per cluster". Guess "per node" could be implemented as either "put in solr.xml" or "give as VM-params", but I dont know if there are future plans on getting rid of solr.xml. "Per cluster" could be implemented by "put in ZK (at a global location)". This will of course raise another issue of protecting access to solr.xml or ZK, but that is already relevant as it is today. We have an upcoming US focusing on protecting stuff in ZK, so we plan to deal with that part. Files can be protected in misc ways based on the OS. Opinions from the community please: solr.xml? "global" location in ZK? or, do you think this is a solrconfig.xml thingy?
          Hide
          Mark Miller added a comment -

          Yeah, it looks like solr.xml is going away shortly - per core stuff will go in a property file and per node stuff would, I guess, become system props.

          It does seem kind of annoying to have to configure it for every core rather than per node.

          That's all I've got at the moment - will think about it and see what anyone else chimes in with.

          Show
          Mark Miller added a comment - Yeah, it looks like solr.xml is going away shortly - per core stuff will go in a property file and per node stuff would, I guess, become system props. It does seem kind of annoying to have to configure it for every core rather than per node. That's all I've got at the moment - will think about it and see what anyone else chimes in with.
          Hide
          Per Steffensen added a comment -

          It does seem kind of annoying to have to configure it for every core rather than per node.

          It can be done per node in solr.xml or using VM-params. But it is actually also annoying to have to do it for every node, if you have a lot of them in your cluster. It really should be the same credentials across all nodes in the same cluster, so a cluster-global config would be the best - like e.g. in ZK.

          Until we get closer to an agreement on how it can be done I will create a way to specify it in solr.xml or through VM-params. Like e.g. zkHost. This might also end up being the end solution for this ticket and then future tickets might want to improve on the solution.

          Show
          Per Steffensen added a comment - It does seem kind of annoying to have to configure it for every core rather than per node. It can be done per node in solr.xml or using VM-params. But it is actually also annoying to have to do it for every node, if you have a lot of them in your cluster. It really should be the same credentials across all nodes in the same cluster, so a cluster-global config would be the best - like e.g. in ZK. Until we get closer to an agreement on how it can be done I will create a way to specify it in solr.xml or through VM-params. Like e.g. zkHost. This might also end up being the end solution for this ticket and then future tickets might want to improve on the solution.
          Hide
          Mark Miller added a comment -

          But it is actually also annoying to have to do it for every node,

          I suppose it depends. Usually node wide setup is not too difficult for the clusters I've worked with because then I just setup to be configured by sys prop and add the sys props to my solr start script and I call on each node. I hand role that with a shell script and ssh, but I'm sure chef and puppet type stuff is just as easy.

          Show
          Mark Miller added a comment - But it is actually also annoying to have to do it for every node, I suppose it depends. Usually node wide setup is not too difficult for the clusters I've worked with because then I just setup to be configured by sys prop and add the sys props to my solr start script and I call on each node. I hand role that with a shell script and ssh, but I'm sure chef and puppet type stuff is just as easy.
          Hide
          Mark Miller added a comment -

          Like e.g. zkHost.

          I think that works fine. Little work, easy to configure for a cluster, and easy to change to something else if it ends up making sense.

          Show
          Mark Miller added a comment - Like e.g. zkHost. I think that works fine. Little work, easy to configure for a cluster, and easy to change to something else if it ends up making sense.
          Hide
          Uwe Schindler added a comment -

          Acording to HTTP-URL syntax, you can give the basic auth params using http://user:password@host:port/. As Solr is using Commons-Httpsclient for connections, wouldnt this work automatically if you configure the node's HTTP adress using this syntax?

          Show
          Uwe Schindler added a comment - Acording to HTTP-URL syntax, you can give the basic auth params using http://user:password@host:port/ . As Solr is using Commons-Httpsclient for connections, wouldnt this work automatically if you configure the node's HTTP adress using this syntax?
          Hide
          Per Steffensen added a comment - - edited

          Acording to HTTP-URL syntax, you can give the basic auth params using http://user:password@host:port/. As Solr is using Commons-Httpsclient for connections, wouldnt this work automatically if you configure the node's HTTP adress using this syntax?

          This is a way to specify the credentials, yes. But it can be done in other ways and is already done in other ways in the code - see HttpClientUtil.setBasicAuth (have to say that I am currently looking at 4.0 code, but guess it hasnt changed). The problem is not so much setting the credentials for the internal request, but to figure out which credentials to use.
          As I described above I would like the credentials for requests issued by Solr-nodes themselves, to be copied from the originals super-request in cases where such exist - e.g. for search and update. For some of the requests that Solr-nodes issue themselves there are no such super-request (e.g. for sync stuff) and for other requests the sub-requests are issued asynchronously from its super-request (e.g. replica-creation-request are issued asynchronously from a create request to the Collection API). For both such kind of request we need some credentials to include. Thats where configuring "internal credentials" is needed.

          If you where thinking about actually writing URLs like "http://user:password@host:port/" in ZK, that is not going to work, since username/password is not (necessarily) static per target-node. I want to "forward" whatever credentials are given in the super-request to the sub-requests triggered synchronously by this super-requests (e.g. search and update) whereas "internal credentials" will be used when there is no such super-request (sync stuff) or when there is an asynchronous border between the super-request and the sub-requests (e.g. Collection API operations). Besides that we plan to (later) go for HTTPS in order to encrypt the clear text (or base64 encoded, but that can easily be decoded) username/password in the requests, and I believe that the URL itself is not being encrypted in HTTPS.

          Very concrete in our usage of Solr, we (for now) would like to have two users

          • Admin-user which is allowed to do everything
          • Search-user which is only allowed to search

          We will configure solr-nodes with the "Admin-user credentials" as "internal credentials". So they will be used for replica-creation and sync stuff. But "outside users" of our application we only be given the Search-user credentials, and we want to make sure that they are not allowed to do anything but search. It is not cool if a request made with the Search-user credentials results in sub-requests with Admin-user credentials.

          Hope it makes a little bit of sense, of else I hope it will when I provide some code. I will post patch soon with early version of my work.

          Show
          Per Steffensen added a comment - - edited Acording to HTTP-URL syntax, you can give the basic auth params using http://user:password@host:port/ . As Solr is using Commons-Httpsclient for connections, wouldnt this work automatically if you configure the node's HTTP adress using this syntax? This is a way to specify the credentials, yes. But it can be done in other ways and is already done in other ways in the code - see HttpClientUtil.setBasicAuth (have to say that I am currently looking at 4.0 code, but guess it hasnt changed). The problem is not so much setting the credentials for the internal request, but to figure out which credentials to use. As I described above I would like the credentials for requests issued by Solr-nodes themselves, to be copied from the originals super-request in cases where such exist - e.g. for search and update. For some of the requests that Solr-nodes issue themselves there are no such super-request (e.g. for sync stuff) and for other requests the sub-requests are issued asynchronously from its super-request (e.g. replica-creation-request are issued asynchronously from a create request to the Collection API). For both such kind of request we need some credentials to include. Thats where configuring "internal credentials" is needed. If you where thinking about actually writing URLs like "http://user:password@host:port/" in ZK, that is not going to work, since username/password is not (necessarily) static per target-node. I want to "forward" whatever credentials are given in the super-request to the sub-requests triggered synchronously by this super-requests (e.g. search and update) whereas "internal credentials" will be used when there is no such super-request (sync stuff) or when there is an asynchronous border between the super-request and the sub-requests (e.g. Collection API operations). Besides that we plan to (later) go for HTTPS in order to encrypt the clear text (or base64 encoded, but that can easily be decoded) username/password in the requests, and I believe that the URL itself is not being encrypted in HTTPS. Very concrete in our usage of Solr, we (for now) would like to have two users Admin-user which is allowed to do everything Search-user which is only allowed to search We will configure solr-nodes with the "Admin-user credentials" as "internal credentials". So they will be used for replica-creation and sync stuff. But "outside users" of our application we only be given the Search-user credentials, and we want to make sure that they are not allowed to do anything but search. It is not cool if a request made with the Search-user credentials results in sub-requests with Admin-user credentials. Hope it makes a little bit of sense, of else I hope it will when I provide some code. I will post patch soon with early version of my work.
          Hide
          Jan Høydahl added a comment -

          This is a duplicate of SOLR-4407. Seems like all the discussions happen here, so I'll go ahead and mark 4407 as duplicate.

          Show
          Jan Høydahl added a comment - This is a duplicate of SOLR-4407 . Seems like all the discussions happen here, so I'll go ahead and mark 4407 as duplicate.
          Hide
          Per Steffensen added a comment - - edited

          I think you should be able to specify credentials both on SolrServer-level (all requests made through this will have the same credentials added) and on SolrRequest-level (so that you can use the same SolrServer for sending requests with different credentials). I added a credentials-field on SolrRequest and it is all fine if you create the SolrRequest object "yourself", but unfortunately there are a set of "helper"-methods on SolrServer that basically create the SolrRequest object for you without giving you a change to modify it afterwards. How do we prefer to "hand over" credentials for those SolrRequests? Ideas on the top of my head:

          • 1) Add a credentials-param to all the "helper"-methods (maybe make two versions of each method - one that do and one that does not take a credentials object)
          • 2) Change SolrRequest constructor so that it supports reading credentials from a threadlocal, that you will need to set before calling one of the "helper"-methods (instead of providing it as a parameter to the "helper"-method)

          I wouldnt want to do 1) before agreed by the community, and 2) is kinda hacky (even though I like to use threadlocals a lot more than what the average developer seem to do). It seems like 1) was used back when "commitWithinMs" was added, but maybe it is not the way to continue - we will end up with huge set of similar (except for parameter differences) "helper"-methods. Actually I would have preferred that "commitWithinMs" was never made this way - maybe one should have foreseen that this is not the last parameter you want to be able to give to the "helper"-methods in general, so maybe back then you should have introduced a callback-thingy instead of the "commitWithinMs"-parameter - a callback-thingy that could be used for modifying the SolrRequest object after it had been set up by the "helper"-method.

          • 3) Of course that is also an option now, but then we should really get rid of "commitWithinMs" and that will break API backwards compatibility.
          Show
          Per Steffensen added a comment - - edited I think you should be able to specify credentials both on SolrServer-level (all requests made through this will have the same credentials added) and on SolrRequest-level (so that you can use the same SolrServer for sending requests with different credentials). I added a credentials-field on SolrRequest and it is all fine if you create the SolrRequest object "yourself", but unfortunately there are a set of "helper"-methods on SolrServer that basically create the SolrRequest object for you without giving you a change to modify it afterwards. How do we prefer to "hand over" credentials for those SolrRequests? Ideas on the top of my head: 1) Add a credentials-param to all the "helper"-methods (maybe make two versions of each method - one that do and one that does not take a credentials object) 2) Change SolrRequest constructor so that it supports reading credentials from a threadlocal, that you will need to set before calling one of the "helper"-methods (instead of providing it as a parameter to the "helper"-method) I wouldnt want to do 1) before agreed by the community, and 2) is kinda hacky (even though I like to use threadlocals a lot more than what the average developer seem to do). It seems like 1) was used back when "commitWithinMs" was added, but maybe it is not the way to continue - we will end up with huge set of similar (except for parameter differences) "helper"-methods. Actually I would have preferred that "commitWithinMs" was never made this way - maybe one should have foreseen that this is not the last parameter you want to be able to give to the "helper"-methods in general, so maybe back then you should have introduced a callback-thingy instead of the "commitWithinMs"-parameter - a callback-thingy that could be used for modifying the SolrRequest object after it had been set up by the "helper"-method. 3) Of course that is also an option now, but then we should really get rid of "commitWithinMs" and that will break API backwards compatibility.
          Hide
          Per Steffensen added a comment - - edited

          Usually you structure URLs in you web-app by "increasing level of detail" from left to right. This way you can configure the web-container to handle the most obvious security-constraints.

          Unfortunately, IMHO, Solr URLs are not structured by "increasing level of detail" - e.g. "/solr/collection1/update" should have been "/solr/update/collection1" (I consider "collection/replica" as a higher level of detail than "update"). Due to limitations on "url-pattern"s in "security-constraint|web-resource-collection"'s in web.xml (or webdefault.xml for jetty) you cannot write e.g. <url-pattern>/solr/*/update</url-pattern>, <url-pattern>*/update</url-pattern> or <url-pattern>*update</url-pattern> to protect updates - it is not allowed - you can only have * in the end or as part of an extension-pattern ("*.ext" - and the . needs to be there).

          Therefore it is not possible (AFAIK) to configure the web-container to protect "update"-operation or "select"-operation etc. You can configure protection on "all operations" for a specific collection (but not specific operations cross-collections), but it is much more unlikely that that is what you want to do. Or by mentioning <url-pattern>/solr/<collection-name>/update</url-pattern> for every single collection in your setup you can actually protect e.g. update, but that is not possible for those of us that have a dynamic/ever-changing set of collections.

          Possible solutions from the top of my head

          • 1) Make Solr URL structure "right" - e.g. "/solr/update/collection1"
          • 2) Make obvious security constraints like "protecting update" or "protecting search" etc. impossible to be done by web.xml configuration, and leave it up to "programmatic protection"

          I like 1) best, but is that at all feasible, or will it just be way to much work?
          Since Solr is usually not something you change yourself, but something you use out-of-the-box, potentially modifying deployment-descriptors (e.g. web.xml), config files etc. 2) will really not help "the normal Solr user" and it will also be a problem figuring out exactly where to place this "programmatic protection"-code, because despite most Solr-stuff is handled by SolrDispatchFilter there are several resources that is not handled through it.

          Show
          Per Steffensen added a comment - - edited Usually you structure URLs in you web-app by "increasing level of detail" from left to right. This way you can configure the web-container to handle the most obvious security-constraints. Unfortunately, IMHO, Solr URLs are not structured by "increasing level of detail" - e.g. "/solr/collection1/update" should have been "/solr/update/collection1" (I consider "collection/replica" as a higher level of detail than "update"). Due to limitations on "url-pattern"s in "security-constraint|web-resource-collection"'s in web.xml (or webdefault.xml for jetty) you cannot write e.g. <url-pattern>/solr/*/update</url-pattern>, <url-pattern>*/update</url-pattern> or <url-pattern>*update</url-pattern> to protect updates - it is not allowed - you can only have * in the end or as part of an extension-pattern ("*.ext" - and the . needs to be there). Therefore it is not possible (AFAIK) to configure the web-container to protect "update"-operation or "select"-operation etc. You can configure protection on "all operations" for a specific collection (but not specific operations cross-collections), but it is much more unlikely that that is what you want to do. Or by mentioning <url-pattern>/solr/<collection-name>/update</url-pattern> for every single collection in your setup you can actually protect e.g. update, but that is not possible for those of us that have a dynamic/ever-changing set of collections. Possible solutions from the top of my head 1) Make Solr URL structure "right" - e.g. "/solr/update/collection1" 2) Make obvious security constraints like "protecting update" or "protecting search" etc. impossible to be done by web.xml configuration, and leave it up to "programmatic protection" I like 1) best, but is that at all feasible, or will it just be way to much work? Since Solr is usually not something you change yourself, but something you use out-of-the-box, potentially modifying deployment-descriptors (e.g. web.xml), config files etc. 2) will really not help "the normal Solr user" and it will also be a problem figuring out exactly where to place this "programmatic protection"-code, because despite most Solr-stuff is handled by SolrDispatchFilter there are several resources that is not handled through it.
          Hide
          Jan Høydahl added a comment -

          1) Make Solr URL structure "right" - e.g. "/solr/update/collection1"
          2) Make obvious security constraints like "protecting update" or "protecting search" etc. impossible to be done by web.xml configuration, and leave it up to "programmatic protection"

          I think 1) is a no-starter, because someone may have another usecase, namely assigning collections to different customers and thus collection is more important than action. It all boils down to that you should trust those that you authenticate to access your server enough for them not to mock around deleting indices or something. If you need like crazy detailed authorization scheme, then put a programmable proxy in front of Solr, like Varnish or something!

          This issue should be about BASIC auth and perhaps certificate based auth, with the intention of blocking out people or machines that should not have access to search at all, versus those that should. Then it would be a completely different beast of a JIRA to add detailed authorization support.

          Show
          Jan Høydahl added a comment - 1) Make Solr URL structure "right" - e.g. "/solr/update/collection1" 2) Make obvious security constraints like "protecting update" or "protecting search" etc. impossible to be done by web.xml configuration, and leave it up to "programmatic protection" I think 1) is a no-starter, because someone may have another usecase, namely assigning collections to different customers and thus collection is more important than action. It all boils down to that you should trust those that you authenticate to access your server enough for them not to mock around deleting indices or something. If you need like crazy detailed authorization scheme, then put a programmable proxy in front of Solr, like Varnish or something! This issue should be about BASIC auth and perhaps certificate based auth, with the intention of blocking out people or machines that should not have access to search at all, versus those that should. Then it would be a completely different beast of a JIRA to add detailed authorization support.
          Hide
          Per Steffensen added a comment -

          I think 1) is a no-starter

          Agree!

          This issue should be about BASIC auth and perhaps certificate based auth

          I will only do basic http auth, but do it in a way where there is some kind of "framework", making it "easy" to add other kinds of authentication

          with the intention of blocking out people or machines that should not have access to search at all, versus those that should

          Actually the focus is not so much on how to protect solr, but on making sure you can make solr-to-solr internal requests work when/if you do.

          Show
          Per Steffensen added a comment - I think 1) is a no-starter Agree! This issue should be about BASIC auth and perhaps certificate based auth I will only do basic http auth, but do it in a way where there is some kind of "framework", making it "easy" to add other kinds of authentication with the intention of blocking out people or machines that should not have access to search at all, versus those that should Actually the focus is not so much on how to protect solr, but on making sure you can make solr-to-solr internal requests work when/if you do.
          Hide
          Per Steffensen added a comment - - edited

          then put a programmable proxy in front of Solr, like Varnish or something

          Uhhh, I think that is too much to require for doing faily simple authorization. And depending on where you put Varnish the authorization might not be preformed on solr-to-solr requests, and one of the focus points here is to make sure it is, because we basically do not want to make it possible to have a solr-to-solr request R2 performed as a direct reaction to a request R1 comming from "the outside", where the "outside user" is authorized to do R1 but not to do R2.

          But I guess it can fairly easy be done using a filter, and I will put up a description on Wiki on how to do it. We are going to do it anyway in my project.

          Show
          Per Steffensen added a comment - - edited then put a programmable proxy in front of Solr, like Varnish or something Uhhh, I think that is too much to require for doing faily simple authorization. And depending on where you put Varnish the authorization might not be preformed on solr-to-solr requests, and one of the focus points here is to make sure it is, because we basically do not want to make it possible to have a solr-to-solr request R2 performed as a direct reaction to a request R1 comming from "the outside", where the "outside user" is authorized to do R1 but not to do R2. But I guess it can fairly easy be done using a filter, and I will put up a description on Wiki on how to do it. We are going to do it anyway in my project.
          Hide
          Jan Høydahl added a comment -

          So let this issue deal with authentication - if you have access then you have access to all - and make another jira for authorization

          Show
          Jan Høydahl added a comment - So let this issue deal with authentication - if you have access then you have access to all - and make another jira for authorization
          Hide
          Per Steffensen added a comment -

          So let this issue deal with authentication - if you have access then you have access to all - and make another jira for authorization

          Basically yes! The decision on who can be authenticated and to what they can be authorized, is not the focus of this jira. It is up to the admins of the Solr cluster to setup/program that. This jira is about what credentials a solr-node will add to the requests it sends itself to other solr-nodes.

          Joggling with authentication/authorization setups is actually more of a matter in the tests I make in the solr-test-suite, where I want to make a more "normal" kind of setup:

          • a search-user that can do searches only
          • an update-user that can do updates only
          • an admin-user that is allowed to do everything

          This cannot be achieved by configuring the web-container. You need to do programmatic authorization. I will figure out whether I will

          • 1) go for the above user/authorization wrt tests
          • 2) or if I will make another cut where different users have access to do stuff on different limited sets of collections
            The thing is that a solr-node makes requests to another solr-node in A LOT of different places, and to make sure it all works I would really not want to make a protection-focused test-suite that basically tests everything (search, update, collection-creation/deletion, peer-sync, recovery etc.) again - but just with protection added. I would rather apply protection to the existing cloud/distributed test-suite (to make pretty sure every kind of solr-to-solr-request case is covered), and doing that will be much easier with 1) and 2).
          Show
          Per Steffensen added a comment - So let this issue deal with authentication - if you have access then you have access to all - and make another jira for authorization Basically yes! The decision on who can be authenticated and to what they can be authorized, is not the focus of this jira. It is up to the admins of the Solr cluster to setup/program that. This jira is about what credentials a solr-node will add to the requests it sends itself to other solr-nodes. Joggling with authentication/authorization setups is actually more of a matter in the tests I make in the solr-test-suite, where I want to make a more "normal" kind of setup: a search-user that can do searches only an update-user that can do updates only an admin-user that is allowed to do everything This cannot be achieved by configuring the web-container. You need to do programmatic authorization. I will figure out whether I will 1) go for the above user/authorization wrt tests 2) or if I will make another cut where different users have access to do stuff on different limited sets of collections The thing is that a solr-node makes requests to another solr-node in A LOT of different places, and to make sure it all works I would really not want to make a protection-focused test-suite that basically tests everything (search, update, collection-creation/deletion, peer-sync, recovery etc.) again - but just with protection added. I would rather apply protection to the existing cloud/distributed test-suite (to make pretty sure every kind of solr-to-solr-request case is covered), and doing that will be much easier with 1) and 2).
          Hide
          Per Steffensen added a comment -

          Doesnt the community have an opinion on https://issues.apache.org/jira/browse/SOLR-4470?focusedCommentId=13582144&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13582144

          For now I have made the threadlocal solution and it works fine, but I guess I/you would prefer another solution. I am thinking about replacing "int commitWithinMs", "boolean waitFlush, boolean waitSearcher, boolean softCommit", "boolean waitFlush, boolean waitSearcher, int maxSegments" etc in SolrServer "helper"-methods with "UpdateRequestProps", "CommitRequestProps", "OptimizeRequestProps" etc.

          GeneralRequestProps:

          • field: Credentials credentials

          UpdateRequestProps extends GeneralRequestProps

          • field: int commitWithinMs

          CommitRequestProps extends GeneralRequestProps

          • field: boolean waitFlush
          • field: boolean waitSearcher
          • field: boolean softCommit

          OptimizeRequestProps extends GeneralRequestProps

          • field: boolean waitFlush
          • field: boolean waitSearcher
          • field: int maxSegments
          Show
          Per Steffensen added a comment - Doesnt the community have an opinion on https://issues.apache.org/jira/browse/SOLR-4470?focusedCommentId=13582144&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13582144 For now I have made the threadlocal solution and it works fine, but I guess I/you would prefer another solution. I am thinking about replacing "int commitWithinMs", "boolean waitFlush, boolean waitSearcher, boolean softCommit", "boolean waitFlush, boolean waitSearcher, int maxSegments" etc in SolrServer "helper"-methods with "UpdateRequestProps", "CommitRequestProps", "OptimizeRequestProps" etc. GeneralRequestProps: field: Credentials credentials UpdateRequestProps extends GeneralRequestProps field: int commitWithinMs CommitRequestProps extends GeneralRequestProps field: boolean waitFlush field: boolean waitSearcher field: boolean softCommit OptimizeRequestProps extends GeneralRequestProps field: boolean waitFlush field: boolean waitSearcher field: int maxSegments
          Hide
          Per Steffensen added a comment -

          Dear committer eager to help with this issue

          Would you prefer a 4.x based patch or a 5.x based patch?

          Show
          Per Steffensen added a comment - Dear committer eager to help with this issue Would you prefer a 4.x based patch or a 5.x based patch?
          Hide
          Jan Høydahl added a comment -

          Any will do. If you do the dev on 4.1 then post that patch and just write what version it applies to. Then when it's time for commit we'll port to trunk.

          Or as the old saying goes - an out of date patch in Jira is better than no patch in Jira

          Show
          Jan Høydahl added a comment - Any will do. If you do the dev on 4.1 then post that patch and just write what version it applies to. Then when it's time for commit we'll port to trunk. Or as the old saying goes - an out of date patch in Jira is better than no patch in Jira
          Hide
          Per Steffensen added a comment - - edited

          Well I developed first and foremost to be used internally by my organization, so I developed on top of an internal 4.0.0+ version (the + being modifications we have not gotten committed yet - it is hard to get cool stuff committed if it involves a lot of changes - SOLR-3178). You will probably not be able to use a patch made from there. Therefore I need to move changes to a clean Apache revision/branch anyway, and that is where you could have some influence on the choice. I will go for 4.x branch.

          I would like to introduce a few new sayings

          • Patches in Jira fitting on top for recent revisions are better than patches in Jira fitting on top of old revisions - commit it while its hot
          • Projects where you trust your test-suite enough to dare do major refactoring, are the only projects you can keep building on for a long time
          • When introducing new feature, any developer should see it as his duty to do refactoring to make the code seem as if the new feature was thought into the product from the beginning
          Show
          Per Steffensen added a comment - - edited Well I developed first and foremost to be used internally by my organization, so I developed on top of an internal 4.0.0+ version (the + being modifications we have not gotten committed yet - it is hard to get cool stuff committed if it involves a lot of changes - SOLR-3178 ). You will probably not be able to use a patch made from there. Therefore I need to move changes to a clean Apache revision/branch anyway, and that is where you could have some influence on the choice. I will go for 4.x branch. I would like to introduce a few new sayings Patches in Jira fitting on top for recent revisions are better than patches in Jira fitting on top of old revisions - commit it while its hot Projects where you trust your test-suite enough to dare do major refactoring, are the only projects you can keep building on for a long time When introducing new feature, any developer should see it as his duty to do refactoring to make the code seem as if the new feature was thought into the product from the beginning
          Hide
          Per Steffensen added a comment -

          Prepare for a new feature well covered by test with the following testing strategy

          • 1) We run with "a common security setup" in most (all tests inheriting from BaseDistributedSearchTestCase) distributed/cloud tests
          • 2) There will be a new distributed/cloud test focusing on error-scenarios around security - authentication/authorization errors when you do not provide correct credentials

          I chose 1) because

          • I believe it is the easiest/best way to make sure support for A&A is correct in all inter-solr-node communication
          • It would increase time-to-run-test-suite a lot if I was going to introduce new A&A focused tests for all inter-solr-node communications (search/update/get-sub-requests, collection-API-sub-requests, recovery-sub-request etc.)
          • If we introduce a new feature requiring new inter-solr-node communication it is likely that it will inherit from BaseDistributedSearchTestCase (including security setup out-of-the-box), and therefore "remind" the developer to remember to provide correct credentials for the inter-solr-node requests involved.

          I added 2) because we with 1) will have good coverage of cases where you provide "good" credentials in inter-solr-node communication, but the reason it works might be because security was somehow not enabled or working correctly. Therefore added error-scenarios making sure security is enabled and working, by asserting on authentication/authorization errors when you provide "bad or none" credentials.

          "Common security scheme" being

          • A search-user that is only allowed to do selects/gets
          • An update-user that is only allowed to do updates
          • An all/admin-user that is allowed to do everything
          • You are not allowed to do anything without providing credentials
          Show
          Per Steffensen added a comment - Prepare for a new feature well covered by test with the following testing strategy 1) We run with "a common security setup" in most (all tests inheriting from BaseDistributedSearchTestCase) distributed/cloud tests 2) There will be a new distributed/cloud test focusing on error-scenarios around security - authentication/authorization errors when you do not provide correct credentials I chose 1) because I believe it is the easiest/best way to make sure support for A&A is correct in all inter-solr-node communication It would increase time-to-run-test-suite a lot if I was going to introduce new A&A focused tests for all inter-solr-node communications (search/update/get-sub-requests, collection-API-sub-requests, recovery-sub-request etc.) If we introduce a new feature requiring new inter-solr-node communication it is likely that it will inherit from BaseDistributedSearchTestCase (including security setup out-of-the-box), and therefore "remind" the developer to remember to provide correct credentials for the inter-solr-node requests involved. I added 2) because we with 1) will have good coverage of cases where you provide "good" credentials in inter-solr-node communication, but the reason it works might be because security was somehow not enabled or working correctly. Therefore added error-scenarios making sure security is enabled and working, by asserting on authentication/authorization errors when you provide "bad or none" credentials. "Common security scheme" being A search-user that is only allowed to do selects/gets An update-user that is only allowed to do updates An all/admin-user that is allowed to do everything You are not allowed to do anything without providing credentials
          Hide
          Per Steffensen added a comment -

          When patch has been committed (maybe before) I will add to http://wiki.apache.org/solr/SolrSecurity. Today, in practice you cannot setup up "Path based Authentication" (at least not on any URL/resource that is ever hit on inter-solr-node requests), because credentials are not provided in inter-solr-node requests. That will change with the patch and the Wiki page will need a few additional words.

          Show
          Per Steffensen added a comment - When patch has been committed (maybe before) I will add to http://wiki.apache.org/solr/SolrSecurity . Today, in practice you cannot setup up "Path based Authentication" (at least not on any URL/resource that is ever hit on inter-solr-node requests), because credentials are not provided in inter-solr-node requests. That will change with the patch and the Wiki page will need a few additional words.
          Hide
          Per Steffensen added a comment -

          My time was prioritized to work on something else. But now I am back on this issue, and hopefully I will provide patch soon.

          Show
          Per Steffensen added a comment - My time was prioritized to work on something else. But now I am back on this issue, and hopefully I will provide patch soon.
          Hide
          Per Steffensen added a comment - - edited

          About SOLR-4470_branch_4x_r1452629.patch

          • Fits on top of branch_4x revision 1452629
          • Fairly big patch but most of it is to provide credentials across all requests issued by BaseDistributedSearchTest tests or to just to forward decisions on credentials down the call-flow on server-side. Notes on the major changes in non-test code
            • AuthCredentials: New class encapsulating the concept of credentials. Supports "http basic" only for now, but using this object across code wherever credentials are needed, the framework will be in place for supporting other kinds of credentials in the future
            • InterSolrNodeAuthCredentialsFactory: Separation of concern with respect to "getting credentials for inter-solr-node requests". The implementation of the factory can be replaced to "change strategy", but this is only used in tests (for now)
            • InterSolrNodeAuthCredentialsFactory.AuthCredentialsSource: A level of abstraction to be used server-side whenever you have to decide on "how to get credentials for sub-requests" - possible decisions are "use credentials from super-request" or "use internal credentials" but NOT "use no credentials". Any server-side code should be able to provide credentials in requests, because you never know if the solr-cluster has been set up with protection on the URL you are about to hit. Therefore there are basically only the two choices listed above. The credentials (internal or on super-request) might be empty/null in concrete cases if no internal credentials has been specified or of no credentials was provided in the super-request, but you as a programmer never know, so you have to decide on "how to get credentials for sub-requests" for your server-side case. Idea is to "use credentials from super-request" whenever such exists, so that a request from "the outside" can never trigger a sub-request that the outside user was not allowed to do directly by himself. Whenever there is no "super-request" use "internal credentials". There are gray-area-cases like e.g. the Observer that asynchronously issues requests to CoreAdmin API when the Collection API has been called "from the outside" - to be true to the idea, in this case, we ought to actually use the credentials provided in the original request to Collection API, but that would require writing those credentials into the ZK-queue, and I wouldn't include that now, so in this case we use internal credentials. Server-side methods should typically not take AuthCredentials-parameters, because that will not encourage/remind developers to decide on the strategy when they use the method in the future - use AuthCredentialsSource instead.
            • SolrRequest: Has been modified so that it can hold credentials to be used, and info on whether or not to provide them preemptively
            • SolrQueryRequest: Has been modified to hold the credentials used in the request received
            • SolrRequestParsers: Has been modified to extract the credentials from the request and add it to SolrQueryRequest
            • HttpClientUtil: Has been changed to allow for specifying credentials and/or ClientConnectionManager when creating a HttpClient. Credentials to be used across all requests issued by this HttpClient. ClientConnectionManager to be able to share connection-manager across HttpClients - see reason below
            • HttpShardHandler(Factory): It is used many places (and thats good) where sub-request are issued, places doing "use internal credentials" but also places "use credentials from super-request" (SolrCmdDistributor) and therefore HttpShardHandler(Factory) has been modified to support both.
            • LBHttpSolrServer: Changed to allow for specific HttpClient for each request, so that even the loadbalancer in HttpShardHandler can use different HttpClients for different requests. It should have been done this way from the beginning - it is strange to use the defaultHttpClient for loadbalancing request through a HttpShardHandler which has been created with a specific HttpClient to use.
            • Shared ClientConnectionManager instead of HttpClient: Several classes that used to have a HttpClient used for all requests (PeerSync, SyncStrategy and UpdateShardHandler), now have only a shared ClientConnetionManager. This in order to be able to specify different credentials (on HttpClient level) for different requests issued through those components. Actually it is not absolute necessary for PeerSync because it always uses "internal credentials", but it is done anyway in order to allow HttpShardHandlerFactory to insist on getting a decision on "how to get credentials for sub-requests" even when you "bring you own client" - shardhandlers ALWAYS need to have a strategy for providing credentials, because it is always used for inter-solr-node request. I believe it is ok to replace a shared HttpClient with a shared ClientConnectionManager in those cases, because the main reason for sharing a HttpClient must be to share its connection-manager.
            • RegExpAuthorizationFilter: In order to be able to implement the most "common" kind of authorization (see descriptions below) in tests, RegExpAuthorizationFilter has been added for handling authorization. I have placed it in non-test code, even though it is only used in tests. This is due to the fact that it could very well be used by others in a real solr-setup - e.g. we will use it in my project. Will write a little bit about it on http://wiki.apache.org/solr/SolrSecurity.
            • HttpSolrServer: Has been modified to be able to provide credentials (from HttpClient or from request) - both preemptively and non-preemptively. Support for non-preemptive POST requests required a little twist also.
          • Regarding how to use the solution
            • You provide a solr-node with the "internal credentials" using VM-params like this
              java -jar ... -DinternalAuthCredentialsBasicAuthUsername=<username> -DinternalAuthCredentialsBasicAuthPassword=<password> ... start.jar
              

              This is not a very secure way if you do not trust people with access to the machine running the solr-node, e.g. becuase the credentials can be read just by doing a "ps" command. We will make a solution to that, but I guess there is no reason send it here, before this first patch is committed. It probably will be another ticket anyway.

            • You can specify credentials once and for all to be used in every request issued by a specific HttpClient. E.g. to do it for the HttpClient behind a CloudSolrServer
              HttpClientUtil.setAuthCredentials(cloudSolrServer.getLbServer().getHttpClient(), AuthCredentials.createBasicAuthCredentials(<username>, <password>));
              
            • You can also specify credentials on a pre-request basis (will override credentials on the HttpClient for this particular request). Just do the following to your SolrRequest before using it
              solrRequest.setAuthCredentials(AuthCredentials.createBasicAuthCredentials(<username>, <password>));
              
            • Preemptive authentication supported (and is default)
            • Non-preeemptive authentication also supported on a per-request basis. Just do the following to your SolrRequest before using it
              solrRequest.setPreemptiveAuthentication(false);
              

              Have made changes to HttpSolrServer so that non-preemptive POST request will also work. ConcurrentSolrServer enforces preemptive authentication.

          • SolrServer helper-methods (add, commit, optimize, rollback, deleteByQuery, deleteById, ping, query and queryAndStreamResponse) have been change so that they in "the most advanced" variant support credentials as a parameter. I could have made a credentials variant for all existing variants of "the same operation", but that would make us end up with double the helper-methods. This meant that I had to call this version in cases where a non/few-parameters version of the helper-method uses to be called in all test inheriting from BaseDistributedSearchTestCase, because I have decided to run with "common security" on all resources/urls in those tests. I really believe this is a good decision for two reasons
            • It helped me a lot for find particular/hidden places in the solr-server code where inter-solr-node requests where issued, so that I am now pretty sure we deal with credentials in all inter-solr-node requests.
            • There is a great chance that future additions to the code where inter-solr-node requests are triggered, will have a test inheriting from BaseDistributedSearchTestCase, a test which will therefore run with "common" security set up, and therefore it is likely that such a test will make the developer discover that he needs to deal with credentials for the inter-solr-requests. Its my best attempt to try to ensure that this feature is not broken in the future.
          • Introduced close on PeerSync and called it wherever needed. I needed it some time in my development work. Not sure if it is needed anymore for this particular issue, but why not get it done.
          • Regarding test
            • Idea is to run with "common security" in all "distributed" tests (all tests inheriting from BaseDistributedSearchTest) and therefore I needed to go add credentials here and there in all the sub-classes, in cases where they do not use the helper-methods (del, commit, add, etc) in BaseDistributedSearchTest itself. Support for running with "common" security has been added to JettySolrRunner and the feature is "turned on" for all test based on BaseDistributedSearchTest
            • In this "common security" the web-container handles authentication (required for everything), but authorization is handled by RegExpAuthorizationFilter, in order to achieve what I consider the most "common" kind of authorization - namely authorization based on "operations allowed" instead of "collections allowed". In this "common security" setup used in BaseDistributedSearchTest tests, we have an all/internal/admin-user which is authorized to do anything, and search- plus update-users allowed to do only searches and updates respectively.
            • Collection handling helper-methods (createCollection, checkCollectionExpectations, checkForCollection, checkCollectionIsNotCreated and checkForMissingCollection) and getCommonCloudSolrServer moved from BasicDistributedZkTest to AbstractFullDistribZkTestBase, because I also wanted to use them in new SecurityDistributedTest
          • Additional
            • There is more to say about "security" on a "users-guide" level - will add to http://wiki.apache.org/solr/SolrSecurity. There is also something to be say on a "developers-guide" level. Do we have a developers guide somewhere on Wiki?
            • Regarding running the test-suite: If not already granted in your policy file you need to grant "permission javax.security.auth.AuthPermission "*";" in general or at least for the JettySolrRunner (or the anonymous MappedLoginService in JettySolrRunner). This permission is needed in order to be able to set up the LoginModule in JettySolrRunner (test-only)
            • Havnt run "ant precommit" but I would if I where to commit myself. Guess it will fail due to the fact that I probably do not live up to the (very bad IMHO) coding-style

          Please do not hesitate go ask questions about why I changed particular things the way I did and why it is needed. Or just ask questions in general. But please consider getting the patch in soon while its hot - its good stuff.

          Show
          Per Steffensen added a comment - - edited About SOLR-4470 _branch_4x_r1452629.patch Fits on top of branch_4x revision 1452629 Fairly big patch but most of it is to provide credentials across all requests issued by BaseDistributedSearchTest tests or to just to forward decisions on credentials down the call-flow on server-side. Notes on the major changes in non-test code AuthCredentials: New class encapsulating the concept of credentials. Supports "http basic" only for now, but using this object across code wherever credentials are needed, the framework will be in place for supporting other kinds of credentials in the future InterSolrNodeAuthCredentialsFactory: Separation of concern with respect to "getting credentials for inter-solr-node requests". The implementation of the factory can be replaced to "change strategy", but this is only used in tests (for now) InterSolrNodeAuthCredentialsFactory.AuthCredentialsSource: A level of abstraction to be used server-side whenever you have to decide on "how to get credentials for sub-requests" - possible decisions are "use credentials from super-request" or "use internal credentials" but NOT "use no credentials". Any server-side code should be able to provide credentials in requests, because you never know if the solr-cluster has been set up with protection on the URL you are about to hit. Therefore there are basically only the two choices listed above. The credentials (internal or on super-request) might be empty/null in concrete cases if no internal credentials has been specified or of no credentials was provided in the super-request, but you as a programmer never know, so you have to decide on "how to get credentials for sub-requests" for your server-side case. Idea is to "use credentials from super-request" whenever such exists, so that a request from "the outside" can never trigger a sub-request that the outside user was not allowed to do directly by himself. Whenever there is no "super-request" use "internal credentials". There are gray-area-cases like e.g. the Observer that asynchronously issues requests to CoreAdmin API when the Collection API has been called "from the outside" - to be true to the idea, in this case, we ought to actually use the credentials provided in the original request to Collection API, but that would require writing those credentials into the ZK-queue, and I wouldn't include that now, so in this case we use internal credentials. Server-side methods should typically not take AuthCredentials-parameters, because that will not encourage/remind developers to decide on the strategy when they use the method in the future - use AuthCredentialsSource instead. SolrRequest: Has been modified so that it can hold credentials to be used, and info on whether or not to provide them preemptively SolrQueryRequest: Has been modified to hold the credentials used in the request received SolrRequestParsers: Has been modified to extract the credentials from the request and add it to SolrQueryRequest HttpClientUtil: Has been changed to allow for specifying credentials and/or ClientConnectionManager when creating a HttpClient. Credentials to be used across all requests issued by this HttpClient. ClientConnectionManager to be able to share connection-manager across HttpClients - see reason below HttpShardHandler(Factory): It is used many places (and thats good) where sub-request are issued, places doing "use internal credentials" but also places "use credentials from super-request" (SolrCmdDistributor) and therefore HttpShardHandler(Factory) has been modified to support both. LBHttpSolrServer: Changed to allow for specific HttpClient for each request, so that even the loadbalancer in HttpShardHandler can use different HttpClients for different requests. It should have been done this way from the beginning - it is strange to use the defaultHttpClient for loadbalancing request through a HttpShardHandler which has been created with a specific HttpClient to use. Shared ClientConnectionManager instead of HttpClient: Several classes that used to have a HttpClient used for all requests (PeerSync, SyncStrategy and UpdateShardHandler), now have only a shared ClientConnetionManager. This in order to be able to specify different credentials (on HttpClient level) for different requests issued through those components. Actually it is not absolute necessary for PeerSync because it always uses "internal credentials", but it is done anyway in order to allow HttpShardHandlerFactory to insist on getting a decision on "how to get credentials for sub-requests" even when you "bring you own client" - shardhandlers ALWAYS need to have a strategy for providing credentials, because it is always used for inter-solr-node request. I believe it is ok to replace a shared HttpClient with a shared ClientConnectionManager in those cases, because the main reason for sharing a HttpClient must be to share its connection-manager. RegExpAuthorizationFilter: In order to be able to implement the most "common" kind of authorization (see descriptions below) in tests, RegExpAuthorizationFilter has been added for handling authorization. I have placed it in non-test code, even though it is only used in tests. This is due to the fact that it could very well be used by others in a real solr-setup - e.g. we will use it in my project. Will write a little bit about it on http://wiki.apache.org/solr/SolrSecurity . HttpSolrServer: Has been modified to be able to provide credentials (from HttpClient or from request) - both preemptively and non-preemptively. Support for non-preemptive POST requests required a little twist also. Regarding how to use the solution You provide a solr-node with the "internal credentials" using VM-params like this java -jar ... -DinternalAuthCredentialsBasicAuthUsername=<username> -DinternalAuthCredentialsBasicAuthPassword=<password> ... start.jar This is not a very secure way if you do not trust people with access to the machine running the solr-node, e.g. becuase the credentials can be read just by doing a "ps" command. We will make a solution to that, but I guess there is no reason send it here, before this first patch is committed. It probably will be another ticket anyway. You can specify credentials once and for all to be used in every request issued by a specific HttpClient. E.g. to do it for the HttpClient behind a CloudSolrServer HttpClientUtil.setAuthCredentials(cloudSolrServer.getLbServer().getHttpClient(), AuthCredentials.createBasicAuthCredentials(<username>, <password>)); You can also specify credentials on a pre-request basis (will override credentials on the HttpClient for this particular request). Just do the following to your SolrRequest before using it solrRequest.setAuthCredentials(AuthCredentials.createBasicAuthCredentials(<username>, <password>)); Preemptive authentication supported (and is default) Non-preeemptive authentication also supported on a per-request basis. Just do the following to your SolrRequest before using it solrRequest.setPreemptiveAuthentication( false ); Have made changes to HttpSolrServer so that non-preemptive POST request will also work. ConcurrentSolrServer enforces preemptive authentication. SolrServer helper-methods (add, commit, optimize, rollback, deleteByQuery, deleteById, ping, query and queryAndStreamResponse) have been change so that they in "the most advanced" variant support credentials as a parameter. I could have made a credentials variant for all existing variants of "the same operation", but that would make us end up with double the helper-methods. This meant that I had to call this version in cases where a non/few-parameters version of the helper-method uses to be called in all test inheriting from BaseDistributedSearchTestCase, because I have decided to run with "common security" on all resources/urls in those tests. I really believe this is a good decision for two reasons It helped me a lot for find particular/hidden places in the solr-server code where inter-solr-node requests where issued, so that I am now pretty sure we deal with credentials in all inter-solr-node requests. There is a great chance that future additions to the code where inter-solr-node requests are triggered, will have a test inheriting from BaseDistributedSearchTestCase, a test which will therefore run with "common" security set up, and therefore it is likely that such a test will make the developer discover that he needs to deal with credentials for the inter-solr-requests. Its my best attempt to try to ensure that this feature is not broken in the future. Introduced close on PeerSync and called it wherever needed. I needed it some time in my development work. Not sure if it is needed anymore for this particular issue, but why not get it done. Regarding test Idea is to run with "common security" in all "distributed" tests (all tests inheriting from BaseDistributedSearchTest) and therefore I needed to go add credentials here and there in all the sub-classes, in cases where they do not use the helper-methods (del, commit, add, etc) in BaseDistributedSearchTest itself. Support for running with "common" security has been added to JettySolrRunner and the feature is "turned on" for all test based on BaseDistributedSearchTest In this "common security" the web-container handles authentication (required for everything), but authorization is handled by RegExpAuthorizationFilter, in order to achieve what I consider the most "common" kind of authorization - namely authorization based on "operations allowed" instead of "collections allowed". In this "common security" setup used in BaseDistributedSearchTest tests, we have an all/internal/admin-user which is authorized to do anything, and search- plus update-users allowed to do only searches and updates respectively. Collection handling helper-methods (createCollection, checkCollectionExpectations, checkForCollection, checkCollectionIsNotCreated and checkForMissingCollection) and getCommonCloudSolrServer moved from BasicDistributedZkTest to AbstractFullDistribZkTestBase, because I also wanted to use them in new SecurityDistributedTest Additional There is more to say about "security" on a "users-guide" level - will add to http://wiki.apache.org/solr/SolrSecurity . There is also something to be say on a "developers-guide" level. Do we have a developers guide somewhere on Wiki? Regarding running the test-suite: If not already granted in your policy file you need to grant "permission javax.security.auth.AuthPermission "*";" in general or at least for the JettySolrRunner (or the anonymous MappedLoginService in JettySolrRunner). This permission is needed in order to be able to set up the LoginModule in JettySolrRunner (test-only) Havnt run "ant precommit" but I would if I where to commit myself. Guess it will fail due to the fact that I probably do not live up to the (very bad IMHO) coding-style Please do not hesitate go ask questions about why I changed particular things the way I did and why it is needed. Or just ask questions in general. But please consider getting the patch in soon while its hot - its good stuff.
          Hide
          Mark Miller added a comment -

          Havnt run "ant precommit" but I would if I where to commit myself.

          You could run it anyway.

          Guess it will fail due to the fact that I probably do not live up to the (very bad IMHO) coding-style

          Pre commit doesn't check coding style, but that's about where I lost my interest in continuing my review. Perhaps you can catch another committers attention.

          Show
          Mark Miller added a comment - Havnt run "ant precommit" but I would if I where to commit myself. You could run it anyway. Guess it will fail due to the fact that I probably do not live up to the (very bad IMHO) coding-style Pre commit doesn't check coding style, but that's about where I lost my interest in continuing my review. Perhaps you can catch another committers attention.
          Hide
          Jan Høydahl added a comment -

          Wow. That was some bulk of information Skimmed the code - very good that you have extensive test coverage, some JavaDocs etc. It would in my eyes be wiser to split the elephant in order to attract committer attention. Is there some major part you can leave out and start with the simplest possible codebase to enable simple auth on internal requests only? Then once that is reviewed and committed (to trunk), you could add the rest in a few more steps. I'm afraid there is a risk this patch will rot and become stale and un-applyable very very quickly, both because it is too much in one bite and that it touches so many lines that it will not apply cleanly for many days in a row.

          Another option is to do the finalization of auth in a new branch. Just a pity non-committers can't work with branches in svn. I could create a branch off trunk for you and commit your code to it if you're willing to make it apply. Then everyone can more easily checkout the branch, test, create new JIRAs with diffs against the branch and get some Heavy Committing™.

          PS: Please run ant precommit - You'll want Mark & Yonik on your side here

          Show
          Jan Høydahl added a comment - Wow. That was some bulk of information Skimmed the code - very good that you have extensive test coverage, some JavaDocs etc. It would in my eyes be wiser to split the elephant in order to attract committer attention. Is there some major part you can leave out and start with the simplest possible codebase to enable simple auth on internal requests only? Then once that is reviewed and committed (to trunk), you could add the rest in a few more steps. I'm afraid there is a risk this patch will rot and become stale and un-applyable very very quickly, both because it is too much in one bite and that it touches so many lines that it will not apply cleanly for many days in a row. Another option is to do the finalization of auth in a new branch. Just a pity non-committers can't work with branches in svn. I could create a branch off trunk for you and commit your code to it if you're willing to make it apply. Then everyone can more easily checkout the branch, test, create new JIRAs with diffs against the branch and get some Heavy Committing™. PS: Please run ant precommit - You'll want Mark & Yonik on your side here
          Hide
          Mark Miller added a comment -

          PS: Please run ant precommit - You'll want Mark & Yonik on your side here

          Oh, I don't mind if he runs precommit or not - I don't think contributors need to. Jenkins will catch these things, or a committer will run it. I was just suggesting there is nothing stopping him whether he commits it or not.

          I've just read enough little negative jabs through all of the issues I've interacted with Per on that I'm ready to let someone else work with him if they desire. Getting into security like this has usually been avoided with Solr - coming in with a bad attitude just seems counter productive.

          I'm going to spend my time helping less prickly contributors.

          Show
          Mark Miller added a comment - PS: Please run ant precommit - You'll want Mark & Yonik on your side here Oh, I don't mind if he runs precommit or not - I don't think contributors need to. Jenkins will catch these things, or a committer will run it. I was just suggesting there is nothing stopping him whether he commits it or not. I've just read enough little negative jabs through all of the issues I've interacted with Per on that I'm ready to let someone else work with him if they desire. Getting into security like this has usually been avoided with Solr - coming in with a bad attitude just seems counter productive. I'm going to spend my time helping less prickly contributors.
          Hide
          Per Steffensen added a comment - - edited

          You could run it anyway.

          I will bring a patch passing precommit shortly

          Perhaps you can catch another committers attention.

          Hopefully

          It would in my eyes be wiser to split the elephant

          I agree, but it is hard, because the changes in the actual non-test code is not that comprehensive. I will try though, if a least one committer indicates that he will take the patch seriously and spent some time on this issue.

          Is there some major part you can leave out and start with the simplest possible codebase to enable simple auth on internal requests only?

          Actually that is all it is already - besides I fixed the PeerSync.close issue, which is just a few lines. Most of the patch is related to testing, and I dont like to deliver parts that are not thoroughly tested.

          Another option is to do the finalization of auth in a new branch

          We could do that, but will it make a difference? The end goal is to have it on a branch from which an actual release will be build. The changes will become hard to merge very soon anyway - wont they?

          Just a pity non-committers can't work with branches in svn

          Well, I guess it wouldnt be that hard if the branch is made from branch_4x revision 1452629 - the patch will fit without any problems.

          I could create a branch off trunk for you and commit your code to it if you're willing to make it apply

          I dont know the magnitude of the diff between branch_4x and trunk - according to Mark they where almost equal some time ago, but that might not be the case anymore? Besides that you indicated earlier that it wouldnt matter if it was a bracn_4x or a trunk patch. But I think I would be able to create a patch fitting trunk fairly easy. But again, there is no need I do this, if no committer is willing to actually take this patch seriously and work a little bit with it. But if you think it makes a difference Jan, I will do it? But please also consider just making the branch from branch_4x and it will fit without problems.

          Oh, I don't mind if he runs precommit or not - I don't think contributors need to.

          That was my impression too

          I've just read enough little negative jabs through all of the issues I've interacted with Per on that I'm ready to let someone else work with him if they desire. Getting into security like this has usually been avoided with Solr - coming in with a bad attitude just seems counter productive.

          Maybe it is just because I actually give a damn. Unfortunately I have to say that I do not believe Solr(Cloud) will become a serious competition to ElasticSearch (and others), if quality of code and committer attitude do not change. The code is a mess and will not be maintainable for very long if way-of-working do not change. The main reason is that apparently no one ever refactor. It is all just small patches on top of already messy code making it even more messy. It is like you are not allowed to do natural refactoring because it make a patch bigger than the absolute minimum to introduce the new feature. And apparently we do not trust the test-suite enough to say, that if it is green after a patch (and you didnt actually delete, modify or ignore existing tests in a suspicious manner), nothing was broken. It is an illusion that people will ever do refactor/cleanup-only patches - refactoring/cleanup is something you do as a natural thing when touching the code for other reasons. And even if someone actually did a refactor/cleanup-only patch it would probably be too big by itself to make it into the code anyway.

          The small jabs are continuous attempt to try to influence things that I believe is bad around Solr, and it IS because I give a damn, so IMHO the bad attitude is not on my side, it is on (some of) the Solr-core-members side. Doing retrospective sessions among Solr-core-members might be a good idea, to at least consider what works well and what might needs some twisting (process- and code-requirements-wise). And I am a little surprised that this single last line it my thorough description above was the only one that really got some attention. It is a one-line small jab in a big productive description.

          Of course I accept current rules and agreed-ways-to-do-things, but it will not stop me trying to change it wherever I think it is currently wrong. Because I give a damn.

          Maybe Solr-core-members also have something to learn from developers outside the Solr-core. I have a lot of experience as a developer, architect, mentor, teacher etc., and I have strong opinions based on experience. But I am aways ready to listen to arguments and learn, and you have heard me (several time) say "I stand corrected" when I believe I learned something. I expected others do be ready to listen and learn too. Arguments count - not "counter productive" attitudes like "we have always done it this way and it is the right thing to do" and "people trying to change things are just counter productive".

          You'll want Mark & Yonik on your side here

          Yes certainly, but I probably didnt make progress in that area by the statements above

          Show
          Per Steffensen added a comment - - edited You could run it anyway. I will bring a patch passing precommit shortly Perhaps you can catch another committers attention. Hopefully It would in my eyes be wiser to split the elephant I agree, but it is hard, because the changes in the actual non-test code is not that comprehensive. I will try though, if a least one committer indicates that he will take the patch seriously and spent some time on this issue. Is there some major part you can leave out and start with the simplest possible codebase to enable simple auth on internal requests only? Actually that is all it is already - besides I fixed the PeerSync.close issue, which is just a few lines. Most of the patch is related to testing, and I dont like to deliver parts that are not thoroughly tested. Another option is to do the finalization of auth in a new branch We could do that, but will it make a difference? The end goal is to have it on a branch from which an actual release will be build. The changes will become hard to merge very soon anyway - wont they? Just a pity non-committers can't work with branches in svn Well, I guess it wouldnt be that hard if the branch is made from branch_4x revision 1452629 - the patch will fit without any problems. I could create a branch off trunk for you and commit your code to it if you're willing to make it apply I dont know the magnitude of the diff between branch_4x and trunk - according to Mark they where almost equal some time ago, but that might not be the case anymore? Besides that you indicated earlier that it wouldnt matter if it was a bracn_4x or a trunk patch. But I think I would be able to create a patch fitting trunk fairly easy. But again, there is no need I do this, if no committer is willing to actually take this patch seriously and work a little bit with it. But if you think it makes a difference Jan, I will do it? But please also consider just making the branch from branch_4x and it will fit without problems. Oh, I don't mind if he runs precommit or not - I don't think contributors need to. That was my impression too I've just read enough little negative jabs through all of the issues I've interacted with Per on that I'm ready to let someone else work with him if they desire. Getting into security like this has usually been avoided with Solr - coming in with a bad attitude just seems counter productive. Maybe it is just because I actually give a damn. Unfortunately I have to say that I do not believe Solr(Cloud) will become a serious competition to ElasticSearch (and others), if quality of code and committer attitude do not change. The code is a mess and will not be maintainable for very long if way-of-working do not change. The main reason is that apparently no one ever refactor. It is all just small patches on top of already messy code making it even more messy. It is like you are not allowed to do natural refactoring because it make a patch bigger than the absolute minimum to introduce the new feature. And apparently we do not trust the test-suite enough to say, that if it is green after a patch (and you didnt actually delete, modify or ignore existing tests in a suspicious manner), nothing was broken. It is an illusion that people will ever do refactor/cleanup-only patches - refactoring/cleanup is something you do as a natural thing when touching the code for other reasons. And even if someone actually did a refactor/cleanup-only patch it would probably be too big by itself to make it into the code anyway. The small jabs are continuous attempt to try to influence things that I believe is bad around Solr, and it IS because I give a damn, so IMHO the bad attitude is not on my side, it is on (some of) the Solr-core-members side. Doing retrospective sessions among Solr-core-members might be a good idea, to at least consider what works well and what might needs some twisting (process- and code-requirements-wise). And I am a little surprised that this single last line it my thorough description above was the only one that really got some attention. It is a one-line small jab in a big productive description. Of course I accept current rules and agreed-ways-to-do-things, but it will not stop me trying to change it wherever I think it is currently wrong. Because I give a damn. Maybe Solr-core-members also have something to learn from developers outside the Solr-core. I have a lot of experience as a developer, architect, mentor, teacher etc., and I have strong opinions based on experience. But I am aways ready to listen to arguments and learn, and you have heard me (several time) say "I stand corrected" when I believe I learned something. I expected others do be ready to listen and learn too. Arguments count - not "counter productive" attitudes like "we have always done it this way and it is the right thing to do" and "people trying to change things are just counter productive". You'll want Mark & Yonik on your side here Yes certainly, but I probably didnt make progress in that area by the statements above
          Hide
          Per Steffensen added a comment - - edited

          New patch. Passes precommit if you do the following

          cd <checkout-of-branch_4x-r1452629>
          patch -p0 -i SOLR-4470_branch_4x_r1452629.patch
          svn stat | grep "^?" | awk '{print $2}' | xargs svn add
          svn propset svn:eol-style native solr/core/src/java/org/apache/solr/security/InterSolrNodeAuthCredentialsFactory.java
          svn propset svn:eol-style native solr/core/src/java/org/apache/solr/servlet/security/RegExpAuthorizationFilter.java
          svn propset svn:eol-style native solr/core/src/test/org/apache/solr/cloud/SecurityDistributedTest.java
          svn propset svn:eol-style native solr/solrj/src/java/org/apache/solr/security/AuthCredentials.java
          svn propset svn:eol-style native solr/test-framework/src/java/org/apache/solr/cloud/InterSolrNodeAuthCredentialsFactoryTestingHelper.java
          ant precommit
          

          Only problem is that it says "Linting documentation HTML is not supported on this Java version (1.6) / JVM (Java HotSpot(TM) 64-Bit Server VM).", but I guess that is because I am not "is.jenkins.build", and I do not know what to do about that on my local machine. If I comment out the stuff in target "-documentation-lint-unsupported" in lucene/common-build.xml it passes "ant precommit"

          Besides that, the new patch contains a small modification to RegExpAuthorizationFilter so that it works in both a "real" jetty and in a "test" jetty (started by JettySolrRunning). Strange but HttpServletRequest.getPathInfo() returns the correct path in "test" jetty, but returns null in "real" jetty. And HttpServletRequest.getServletPath() returns correct path in "real" jetty, but returns empty/null in "test" jetty - so now we use whatever is non-null/empty.

          Show
          Per Steffensen added a comment - - edited New patch. Passes precommit if you do the following cd <checkout-of-branch_4x-r1452629> patch -p0 -i SOLR-4470_branch_4x_r1452629.patch svn stat | grep "^?" | awk '{print $2}' | xargs svn add svn propset svn:eol-style native solr/core/src/java/org/apache/solr/security/InterSolrNodeAuthCredentialsFactory.java svn propset svn:eol-style native solr/core/src/java/org/apache/solr/servlet/security/RegExpAuthorizationFilter.java svn propset svn:eol-style native solr/core/src/test/org/apache/solr/cloud/SecurityDistributedTest.java svn propset svn:eol-style native solr/solrj/src/java/org/apache/solr/security/AuthCredentials.java svn propset svn:eol-style native solr/test-framework/src/java/org/apache/solr/cloud/InterSolrNodeAuthCredentialsFactoryTestingHelper.java ant precommit Only problem is that it says "Linting documentation HTML is not supported on this Java version (1.6) / JVM (Java HotSpot(TM) 64-Bit Server VM).", but I guess that is because I am not "is.jenkins.build", and I do not know what to do about that on my local machine. If I comment out the stuff in target "-documentation-lint-unsupported" in lucene/common-build.xml it passes "ant precommit" Besides that, the new patch contains a small modification to RegExpAuthorizationFilter so that it works in both a "real" jetty and in a "test" jetty (started by JettySolrRunning). Strange but HttpServletRequest.getPathInfo() returns the correct path in "test" jetty, but returns null in "real" jetty. And HttpServletRequest.getServletPath() returns correct path in "real" jetty, but returns empty/null in "test" jetty - so now we use whatever is non-null/empty.
          Hide
          Jan Høydahl added a comment -

          Please use JIRA to discuss the implementation.

          I'm tired of long off-topic discussions here in JIRA, if you need to ventilate, please start a new thread on the dev-list.

          Show
          Jan Høydahl added a comment - Please use JIRA to discuss the implementation. I'm tired of long off-topic discussions here in JIRA, if you need to ventilate, please start a new thread on the dev-list.
          Hide
          Jan Høydahl added a comment -

          It is like you are not allowed to do natural refactoring because it make a patch bigger than the absolute minimum to introduce the new feature.

          Please read the Do's and Don'ts in http://wiki.apache.org/solr/HowToContribute#Creating_the_patch_file
          It's not about forbidding this or that, it's just that if a single patch touches too much, history shows that it is hard to maintain and relate to for all the different persons, both committers and contributors. So cohesion matters. There are many examples of patches in the past doing only refactoring or only formatting changes. Those are welcome too, but it tends to be more efficient to first check in an API change which is necessary for a new feature, and then check in the core parts of that feature, and later check in the rest.

          Also, if a change is considered major or risky we have historically started in a branch, then when mature merged in to trunk, then let it bake with nightly builds and other feedback for a few months before back-porting to the current stable branch. Chances are we'd want something similar here too, and that's why I suggested a branch off of trunk to start with.

          Rather than trying to book a certain committer up-front to promise to work on the code you should incrementally continue work on it yourself, listening to feedback, rework the patch/branch... It's more likely to attract a committer helping with the last mile of something which is obviously being in the makings and being improved for a long time, than up-front commitments.

          My 2¢

          Show
          Jan Høydahl added a comment - It is like you are not allowed to do natural refactoring because it make a patch bigger than the absolute minimum to introduce the new feature. Please read the Do's and Don'ts in http://wiki.apache.org/solr/HowToContribute#Creating_the_patch_file It's not about forbidding this or that, it's just that if a single patch touches too much, history shows that it is hard to maintain and relate to for all the different persons, both committers and contributors. So cohesion matters. There are many examples of patches in the past doing only refactoring or only formatting changes. Those are welcome too, but it tends to be more efficient to first check in an API change which is necessary for a new feature, and then check in the core parts of that feature, and later check in the rest. Also, if a change is considered major or risky we have historically started in a branch, then when mature merged in to trunk, then let it bake with nightly builds and other feedback for a few months before back-porting to the current stable branch. Chances are we'd want something similar here too, and that's why I suggested a branch off of trunk to start with. Rather than trying to book a certain committer up-front to promise to work on the code you should incrementally continue work on it yourself, listening to feedback, rework the patch/branch... It's more likely to attract a committer helping with the last mile of something which is obviously being in the makings and being improved for a long time, than up-front commitments. My 2¢
          Hide
          Jan Høydahl added a comment -

          So a first suggestion for one thing you could start with is factoring out the fix for the PeerSync.close issue into an own patch and get that committed, since it is a side-issue.

          Show
          Jan Høydahl added a comment - So a first suggestion for one thing you could start with is factoring out the fix for the PeerSync.close issue into an own patch and get that committed, since it is a side-issue.
          Hide
          Per Steffensen added a comment -

          Added a few (well a lot) words to http://wiki.apache.org/solr/SolrSecurity. Both elaborating on setting up and using security in general, but also how my patch will change things. The "with SOLR-4470" and "without SOLR-4470" parts can be removed when/if the patch gets committed. Please do not delete the SOLR-4470 related descriptions, because someone might want to use and understand what the patch does for you, even though it is never committed.

          Please read the Do's and Don'ts in http://wiki.apache.org/solr/HowToContribute#Creating_the_patch_file

          I did, but just because things are mentioned there, it does not make them less "bad" or "good". Actually I am trying to change the "rules" there and in the minds of committers in general

          It's not about forbidding this or that, it's just that if a single patch touches too much, history shows that it is hard to maintain and relate to for all the different persons, both committers and contributors.

          I understand and agree that this is a valid downside of big patches, but the alternative apparently is that refactoring is not done - enough.

          There are many examples of patches in the past doing only refactoring or only formatting changes

          Glad to hear that this happens. But apparently not enough - look at the code! - structure, reparation of concerns, etc.

          Also, if a change is considered major or risky we have historically started in a branch

          That is a nice strategy, and my patch might be considered major or risky to some. I am completely in on the branch thingy. You are very welcome to branch and add the patch - let me know if I can help in any way. I just fear that it will sit on the branch and rot - and basically I do not care where the patch rots I am interested in getting it committed and used in a future release, so that others can benefit, and so that there is less potential merging conflicts when we want to upgrade our "local version of Solr" (which includes this patch, because we need it in production) to include new code from Apache Solr.

          Rather than trying to book a certain committer up-front to promise to work on the code you should incrementally continue work on it yourself, listening to feedback, rework the patch/branch

          I would like to, but I do not want to put in the work as long as I fear that no one want to work with me, no matter what I do. I didnt ask for a promise, but just an indication that some committer (at least at the time being) has the intention to participate.

          And this way of working will make it a very big and long-term process to get in the complete patch. I am not hired to work on Solr, but to deliver a product to a customer, a product which happens to be based on Solr. I can do a perfectly nice piece of work introducing this feature in a week or two, for it to be used in our application. My customer is glad to pay me doing that, but he does not understand how I can make the change in a week or two, see it work in production, but that I have to use half a year getting it in at Apache Solr - they ought to be happy that we already contributed a few weeks of work to introduce a feature that works, he thinks.

          So a first suggestion for one thing you could start with is factoring out the fix for the PeerSync.close issue into an own patch and get that committed, since it is a side-issue.

          I might do that, but its only a few lines, and will only reduce the patch a few percent, so not even you believe that will make a difference

          Show
          Per Steffensen added a comment - Added a few (well a lot) words to http://wiki.apache.org/solr/SolrSecurity . Both elaborating on setting up and using security in general, but also how my patch will change things. The "with SOLR-4470 " and "without SOLR-4470 " parts can be removed when/if the patch gets committed. Please do not delete the SOLR-4470 related descriptions, because someone might want to use and understand what the patch does for you, even though it is never committed. Please read the Do's and Don'ts in http://wiki.apache.org/solr/HowToContribute#Creating_the_patch_file I did, but just because things are mentioned there, it does not make them less "bad" or "good". Actually I am trying to change the "rules" there and in the minds of committers in general It's not about forbidding this or that, it's just that if a single patch touches too much, history shows that it is hard to maintain and relate to for all the different persons, both committers and contributors. I understand and agree that this is a valid downside of big patches, but the alternative apparently is that refactoring is not done - enough. There are many examples of patches in the past doing only refactoring or only formatting changes Glad to hear that this happens. But apparently not enough - look at the code! - structure, reparation of concerns, etc. Also, if a change is considered major or risky we have historically started in a branch That is a nice strategy, and my patch might be considered major or risky to some. I am completely in on the branch thingy. You are very welcome to branch and add the patch - let me know if I can help in any way. I just fear that it will sit on the branch and rot - and basically I do not care where the patch rots I am interested in getting it committed and used in a future release, so that others can benefit, and so that there is less potential merging conflicts when we want to upgrade our "local version of Solr" (which includes this patch, because we need it in production) to include new code from Apache Solr. Rather than trying to book a certain committer up-front to promise to work on the code you should incrementally continue work on it yourself, listening to feedback, rework the patch/branch I would like to, but I do not want to put in the work as long as I fear that no one want to work with me, no matter what I do. I didnt ask for a promise, but just an indication that some committer (at least at the time being) has the intention to participate. And this way of working will make it a very big and long-term process to get in the complete patch. I am not hired to work on Solr, but to deliver a product to a customer, a product which happens to be based on Solr. I can do a perfectly nice piece of work introducing this feature in a week or two, for it to be used in our application. My customer is glad to pay me doing that, but he does not understand how I can make the change in a week or two, see it work in production, but that I have to use half a year getting it in at Apache Solr - they ought to be happy that we already contributed a few weeks of work to introduce a feature that works, he thinks. So a first suggestion for one thing you could start with is factoring out the fix for the PeerSync.close issue into an own patch and get that committed, since it is a side-issue. I might do that, but its only a few lines, and will only reduce the patch a few percent, so not even you believe that will make a difference
          Hide
          Jan Høydahl added a comment -

          I might do that, but its only a few lines, and will only reduce the patch a few percent, so not even you believe that will make a difference

          Actually, the reason I comment at all is that I'm fairly interested in this feature myself on behalf of a few customers, so I'd love to see it succeed, but right now this is too big for my time and priorities to follow through all the way. If it were split in 3 I could probably take one of them.

          Back to discussing the architecture here:

          To me the approach taken looks sane and not too intrusive.

          Even if solr.xml is going away, I think it would make sense to include username & password config options in solr.xml as an alternative to passing them as JAVA_OPTS. You'll quickly see how that is done, and using $

          {var}

          syntax with fallback to a pre-defined default, you can choose whether to supply them as JAVA_OPTS or directly in solr.xml. The solr.xml approach would be less controversial to some I guess. Once solr.xml is nuked, the params will be moved to whatever takes its place.

          Show
          Jan Høydahl added a comment - I might do that, but its only a few lines, and will only reduce the patch a few percent, so not even you believe that will make a difference Actually, the reason I comment at all is that I'm fairly interested in this feature myself on behalf of a few customers, so I'd love to see it succeed, but right now this is too big for my time and priorities to follow through all the way. If it were split in 3 I could probably take one of them. Back to discussing the architecture here: To me the approach taken looks sane and not too intrusive. Even if solr.xml is going away, I think it would make sense to include username & password config options in solr.xml as an alternative to passing them as JAVA_OPTS. You'll quickly see how that is done, and using $ {var} syntax with fallback to a pre-defined default, you can choose whether to supply them as JAVA_OPTS or directly in solr.xml. The solr.xml approach would be less controversial to some I guess. Once solr.xml is nuked, the params will be moved to whatever takes its place.
          Hide
          Per Steffensen added a comment -

          Actually I started out making support for providing "internal credentials" in either solr.xml or as VM params. I removed the solr.xml thing again because we not are going to use it and it would just make the patch even bigger (and we would not like that ). I can dig the solr.xml solution up again, but why not make that as a separate jira on top of the current VM params only solution. No problem.

          Actually in the long run we are not going to use the VM params solution either, because credentials can be seen by issuing a "ps" command. But dealing with that will also be another jira.

          We are not going to use the simple realm.properties based realm that comes with jetty. We have made our own realm using crypto hashed credentials persisted in Zk. Seemed to be a good solution now that Zk is around anyway, and because Zk has a nice watcher feature where realms running in many jettys can receive push information about changes in the underlying credentials/roles.

          Show
          Per Steffensen added a comment - Actually I started out making support for providing "internal credentials" in either solr.xml or as VM params. I removed the solr.xml thing again because we not are going to use it and it would just make the patch even bigger (and we would not like that ). I can dig the solr.xml solution up again, but why not make that as a separate jira on top of the current VM params only solution. No problem. Actually in the long run we are not going to use the VM params solution either, because credentials can be seen by issuing a "ps" command. But dealing with that will also be another jira. We are not going to use the simple realm.properties based realm that comes with jetty. We have made our own realm using crypto hashed credentials persisted in Zk. Seemed to be a good solution now that Zk is around anyway, and because Zk has a nice watcher feature where realms running in many jettys can receive push information about changes in the underlying credentials/roles.
          Hide
          Jan Høydahl added a comment -

          I removed the solr.xml thing again because we not are going to use it and it would just make the patch even bigger (and we would not like that ).

          Agree, keep it simple and iterate from there

          I tried to apply the patch to trunk and there were not as many mismatches as I thought. So although I created a branch https://svn.apache.org/repos/asf/lucene/dev/branches/security/ we may not need to use it before we feel the need for it, as long as the patch keeps applying well for longer periods of time. Will attach trunk patch as SOLR-4470.patch

          Regarding running the test-suite: If not already granted in your policy file you need to grant "permission javax.security.auth.AuthPermission "*";" in general or at least for the JettySolrRunner (or the anonymous MappedLoginService in JettySolrRunner). This permission is needed in order to be able to set up the LoginModule in JettySolrRunner (test-only)

          Hmm, it took some trial & error before I understood why tests failed, then I saw this comment and changed my java.policy file. But how can this be better integrated with Ant so that patching java.policy for all your JDKs is not a requirement for passing Lucene/Solr tests? I think such a requirement may be a no-starter. Is it an alternative to simply modify Jetty's xml file instead of doing it through API?

          Regarding the use of "|" as separator in RegExpAuthorizationFilter

            <init-param>
              <param-name>search-constraint</param-name>
              <param-value>1|update-role,admin-role|^.*/update$</param-value>
            </init-param>
          

          Will it work with regexes containing "|"? Or would it be more readable (and cool) in these JSON-times to use a small object as param-value? Then we could keep the base object syntax for any future Filter implementations.

              {"order": 1, "roles": ["update-role","admin-role"], "regex": "^.*/update$"}
          
          Show
          Jan Høydahl added a comment - I removed the solr.xml thing again because we not are going to use it and it would just make the patch even bigger (and we would not like that ). Agree, keep it simple and iterate from there I tried to apply the patch to trunk and there were not as many mismatches as I thought. So although I created a branch https://svn.apache.org/repos/asf/lucene/dev/branches/security/ we may not need to use it before we feel the need for it, as long as the patch keeps applying well for longer periods of time. Will attach trunk patch as SOLR-4470 .patch Regarding running the test-suite: If not already granted in your policy file you need to grant "permission javax.security.auth.AuthPermission "*";" in general or at least for the JettySolrRunner (or the anonymous MappedLoginService in JettySolrRunner). This permission is needed in order to be able to set up the LoginModule in JettySolrRunner (test-only) Hmm, it took some trial & error before I understood why tests failed, then I saw this comment and changed my java.policy file. But how can this be better integrated with Ant so that patching java.policy for all your JDKs is not a requirement for passing Lucene/Solr tests? I think such a requirement may be a no-starter. Is it an alternative to simply modify Jetty's xml file instead of doing it through API? Regarding the use of "|" as separator in RegExpAuthorizationFilter <init-param> <param-name> search-constraint </param-name> <param-value> 1|update-role,admin-role|^.*/update$ </param-value> </init-param> Will it work with regexes containing "|"? Or would it be more readable (and cool) in these JSON-times to use a small object as param-value? Then we could keep the base object syntax for any future Filter implementations. {"order": 1, "roles": ["update-role","admin-role"], "regex": "^.*/update$"}
          Hide
          Jan Høydahl added a comment -

          Attached patch for trunk

          Show
          Jan Høydahl added a comment - Attached patch for trunk
          Hide
          Jan Høydahl added a comment -

          One comment

          By adding a parameter to the utiliy methods without leaving the old signature in place, you break back-compat with people using the old signature.

          -  public UpdateResponse add(SolrInputDocument doc, int commitWithinMs) throws SolrServerException, IOException {
          +  public UpdateResponse add(SolrInputDocument doc, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException {
          

          We should either ADD the new methods or simply skip adding this to convenience signatures. Users wanting auth can do so with the other (more elaborate) technique, they do not need a convenience method for auth.

               UpdateRequest req = new UpdateRequest();
               req.add(doc);
               req.setAuthCredentials(authCredentials);
          

          One can argue that commitWithin is something most users want to / should use while auth will be more of a a corner case.

          Show
          Jan Høydahl added a comment - One comment By adding a parameter to the utiliy methods without leaving the old signature in place, you break back-compat with people using the old signature. - public UpdateResponse add(SolrInputDocument doc, int commitWithinMs) throws SolrServerException, IOException { + public UpdateResponse add(SolrInputDocument doc, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException { We should either ADD the new methods or simply skip adding this to convenience signatures. Users wanting auth can do so with the other (more elaborate) technique, they do not need a convenience method for auth. UpdateRequest req = new UpdateRequest(); req.add(doc); req.setAuthCredentials(authCredentials); One can argue that commitWithin is something most users want to / should use while auth will be more of a a corner case.
          Hide
          Per Steffensen added a comment -

          as long as the patch keeps applying well for longer periods of time

          But I guess it will not, because there are a fair amount of changes to existing code, even though I am very eager to do "separation of concerns", and since authentication is a separate concern I have tried to isolate as much as I can in new separate classes like AuthCredentials and InterSolrNodeAuthCredentialsFactory. But there are a fair amount of changes in existing classes also

          • 1) Of course the new classes needs to be used around the places in the code where we actually do issue inter-solr-node requests. Unfortunately that is more places than it ought to be, but not that many.
          • 2) There are a lot of changes in tests inheriting from BaseDistributedSearchTest, but they are all trivial - its just a matter of applying credentials to many places where requests are sent
          • 3) There is a two-big-blocks very well documented change in JettySolrServer to have it support running with what I call "common security". This "common security" is activated across all tests inheriting from BaseDistributedSearchTest
          • 4) There is also a fair amount of changes to HttpClientUtil, because I like to isolate everything around credentials setup on HttpClient level in there

          With respect to "applying well for longer periods of time" I am a little afraid that 1), 2) and maybe 4) will not.

          Please consider committing it. I will be around to help fix bugs if there are any, but I am like 99% sure there are not (e.g. the test-suite is still green and we are already using it intensively in our test and staging), and as long as you do not turn on security (as you where not able to do before anyway) I am like 99,9999% sure nothing is broke.

          Hmm, it took some trial & error before I understood why tests failed, then I saw this comment and changed my java.policy file. But how can this be better integrated with Ant so that patching java.policy for all your JDKs is not a requirement for passing Lucene/Solr tests? I think such a requirement may be a no-starter.

          It might be a no-start, yes. But I only experienced this on my local Mac development machine. Our CI buildagents (plain Ubuntu server 12.04 LTS with plain Oracle java6 installed) had no problem, so I think this might just be a problem out-of-the-box on "normal peoples" machines, because most OS (at last OSX does) force a tighter security policy for java than server installations does. If this is just a problem on "developer machines" and it just runs out-of-the-box on your Jenkins CI machines, I believe it will be OK to just "write our way out of problems" on a "how to setup your development environment" Wiki page.

          The java security model with policy file is based on the fact that you have (probably as root) to do modify to the policy file to change permissions. I believe you can point out an alternative policy file while starting the JVM, but once started the java code itself cannot change the permissions. I guess using an alternative policy file (from SVN) is an option.

          It is also an option, and probably the easiest one, to write a small realm ourselves that do not use JAAS javax.security.auth.AuthPermission stuff behind the scenes.

          Is it an alternative to simply modify Jetty's xml file instead of doing it through API?

          I do not think so. I believe jetty.xml is read by the Jetty JVM after startup, and that permissions cannot be changes at that point in time, but it is a long time since I looked at how start.jar works, but if it reads jetty.xml and starts another JVM bases on information in there it might be possible. But we do not use jetty.xml in JettySolrRunner and I do not think we should change that just to deal with this issue. It is also important that we do not make the "fix" in something that will apply when using the code "for real", because it such case I believe it is important the Solr do not change the java permissions behind the scenes. If a concrete installation of Solr wants to use a realm, which requires changes to the java policy it is important that the admins of this installation actually have to deal with it manually. Remember this is just a problem because of properties of the realm we happen to use for testing.

          Regarding the use of "|" as separator in RegExpAuthorizationFilter. Will it work with regexes containing "|"?

          First of all, have a look at the description of RegExpAuthorizationFilter here: http://wiki.apache.org/solr/SolrSecurity#Security_in_Solr_on_per-operation_basis
          Yes it will work with |'s in the reg-exps. I deliberately put the reg-exp last in the |-separated list of things. "order" is a number and cannot ever contain a |. Roles will not be allowed to contain a |, and you will have to be very sick if you choose a role-name containing |, so that is not a problem either. With those observations I can just do the split this way

          • order is the part before the first |
          • roles is the part inbetween the first and the second |
          • reg-exp is "the rest" and can contain |'s

          Or would it be more readable (and cool) in these JSON-times to use a small object as param-value? Then we could keep the base object syntax for any future Filter implementations.

          It would be ok for me to change the syntax into JSON, or maybe support both, but that ought to be another improvement-jira. RegExpAuthorizationFilter is primarily made for test-purposes, even though I put it in the non-test code, because people actually might want to use it "for real" - we will actually do that in my project. It is working the way it is, and if people actually start using it for real and want a "cooler" syntax inside the configuration init-params then make it an improvement-jira.

          By adding a parameter to the utiliy methods without leaving the old signature in place, you break back-compat with people using the old signature

          Ups, I actually intended only to add "new" signatures with AuthCredentials included, but I guess I didnt do it right for "add". That can easily be corrected.

          We should either ADD the new methods or simply skip adding this to convenience signatures. Users wanting auth can do so with the other (more elaborate) technique, they do not need a convenience method for auth

          I tried to get the community opinion on this issue here: https://issues.apache.org/jira/browse/SOLR-4470?focusedCommentId=13582144&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13582144
          But there wasnt much feedback, so I needed to make a decision myself

          I agree that it would be fine if AuthCredentials where not supported by those convenience-methods on SolrServer. Primarily I added them because those methods are used a lot from testing-code where I want to provide credentials, and if I did not add the support for credentials in those methods, changes to the testing-code would be much bigger and definitely harder to accept for a committer. Now the changes to testing-code is fairly easy to grasp, because everything is "the same as before", just with credentials added.

          One can argue that commitWithin is something most users want to / should use while auth will be more of a a corner case

          Agree

          Show
          Per Steffensen added a comment - as long as the patch keeps applying well for longer periods of time But I guess it will not, because there are a fair amount of changes to existing code, even though I am very eager to do "separation of concerns", and since authentication is a separate concern I have tried to isolate as much as I can in new separate classes like AuthCredentials and InterSolrNodeAuthCredentialsFactory. But there are a fair amount of changes in existing classes also 1) Of course the new classes needs to be used around the places in the code where we actually do issue inter-solr-node requests. Unfortunately that is more places than it ought to be, but not that many. 2) There are a lot of changes in tests inheriting from BaseDistributedSearchTest, but they are all trivial - its just a matter of applying credentials to many places where requests are sent 3) There is a two-big-blocks very well documented change in JettySolrServer to have it support running with what I call "common security". This "common security" is activated across all tests inheriting from BaseDistributedSearchTest 4) There is also a fair amount of changes to HttpClientUtil, because I like to isolate everything around credentials setup on HttpClient level in there With respect to "applying well for longer periods of time" I am a little afraid that 1), 2) and maybe 4) will not. Please consider committing it. I will be around to help fix bugs if there are any, but I am like 99% sure there are not (e.g. the test-suite is still green and we are already using it intensively in our test and staging), and as long as you do not turn on security (as you where not able to do before anyway) I am like 99,9999% sure nothing is broke. Hmm, it took some trial & error before I understood why tests failed, then I saw this comment and changed my java.policy file. But how can this be better integrated with Ant so that patching java.policy for all your JDKs is not a requirement for passing Lucene/Solr tests? I think such a requirement may be a no-starter. It might be a no-start, yes. But I only experienced this on my local Mac development machine. Our CI buildagents (plain Ubuntu server 12.04 LTS with plain Oracle java6 installed) had no problem, so I think this might just be a problem out-of-the-box on "normal peoples" machines, because most OS (at last OSX does) force a tighter security policy for java than server installations does. If this is just a problem on "developer machines" and it just runs out-of-the-box on your Jenkins CI machines, I believe it will be OK to just "write our way out of problems" on a "how to setup your development environment" Wiki page. The java security model with policy file is based on the fact that you have (probably as root) to do modify to the policy file to change permissions. I believe you can point out an alternative policy file while starting the JVM, but once started the java code itself cannot change the permissions. I guess using an alternative policy file (from SVN) is an option. It is also an option, and probably the easiest one, to write a small realm ourselves that do not use JAAS javax.security.auth.AuthPermission stuff behind the scenes. Is it an alternative to simply modify Jetty's xml file instead of doing it through API? I do not think so. I believe jetty.xml is read by the Jetty JVM after startup, and that permissions cannot be changes at that point in time, but it is a long time since I looked at how start.jar works, but if it reads jetty.xml and starts another JVM bases on information in there it might be possible. But we do not use jetty.xml in JettySolrRunner and I do not think we should change that just to deal with this issue. It is also important that we do not make the "fix" in something that will apply when using the code "for real", because it such case I believe it is important the Solr do not change the java permissions behind the scenes. If a concrete installation of Solr wants to use a realm, which requires changes to the java policy it is important that the admins of this installation actually have to deal with it manually. Remember this is just a problem because of properties of the realm we happen to use for testing. Regarding the use of "|" as separator in RegExpAuthorizationFilter. Will it work with regexes containing "|"? First of all, have a look at the description of RegExpAuthorizationFilter here: http://wiki.apache.org/solr/SolrSecurity#Security_in_Solr_on_per-operation_basis Yes it will work with |'s in the reg-exps. I deliberately put the reg-exp last in the |-separated list of things. "order" is a number and cannot ever contain a |. Roles will not be allowed to contain a |, and you will have to be very sick if you choose a role-name containing |, so that is not a problem either. With those observations I can just do the split this way order is the part before the first | roles is the part inbetween the first and the second | reg-exp is "the rest" and can contain |'s Or would it be more readable (and cool) in these JSON-times to use a small object as param-value? Then we could keep the base object syntax for any future Filter implementations. It would be ok for me to change the syntax into JSON, or maybe support both, but that ought to be another improvement-jira. RegExpAuthorizationFilter is primarily made for test-purposes, even though I put it in the non-test code, because people actually might want to use it "for real" - we will actually do that in my project. It is working the way it is, and if people actually start using it for real and want a "cooler" syntax inside the configuration init-params then make it an improvement-jira. By adding a parameter to the utiliy methods without leaving the old signature in place, you break back-compat with people using the old signature Ups, I actually intended only to add "new" signatures with AuthCredentials included, but I guess I didnt do it right for "add". That can easily be corrected. We should either ADD the new methods or simply skip adding this to convenience signatures. Users wanting auth can do so with the other (more elaborate) technique, they do not need a convenience method for auth I tried to get the community opinion on this issue here: https://issues.apache.org/jira/browse/SOLR-4470?focusedCommentId=13582144&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13582144 But there wasnt much feedback, so I needed to make a decision myself I agree that it would be fine if AuthCredentials where not supported by those convenience-methods on SolrServer. Primarily I added them because those methods are used a lot from testing-code where I want to provide credentials, and if I did not add the support for credentials in those methods, changes to the testing-code would be much bigger and definitely harder to accept for a committer. Now the changes to testing-code is fairly easy to grasp, because everything is "the same as before", just with credentials added. One can argue that commitWithin is something most users want to / should use while auth will be more of a a corner case Agree
          Hide
          Jan Høydahl added a comment -

          It would be ok for me to change the syntax into JSON, or maybe support both, but that ought to be another improvement-jira.

          Sure

          Primarily I added them because those methods are used a lot from testing-code where I want to provide credentials

          Hmm, perhaps make the new convenience methods protected to avoid public use and keep test code nice (if tests are in same pkg). Then we don't commit to supporting a new API right now and we can decide later how or whether to refactor this.

          Is it ok for you to continue on the trunk patch from now? The alternative is to make one for branch_4x. It's pros and cons. Maybe a branch_4x patch will attract more 4.1 users to try it out in their existing dev/test envs? What do others say?

          Show
          Jan Høydahl added a comment - It would be ok for me to change the syntax into JSON, or maybe support both, but that ought to be another improvement-jira. Sure Primarily I added them because those methods are used a lot from testing-code where I want to provide credentials Hmm, perhaps make the new convenience methods protected to avoid public use and keep test code nice (if tests are in same pkg). Then we don't commit to supporting a new API right now and we can decide later how or whether to refactor this. Is it ok for you to continue on the trunk patch from now? The alternative is to make one for branch_4x. It's pros and cons. Maybe a branch_4x patch will attract more 4.1 users to try it out in their existing dev/test envs? What do others say?
          Hide
          Per Steffensen added a comment -

          Hmm, perhaps make the new convenience methods protected to avoid public use and keep test code nice (if tests are in same pkg)

          They are not in the same package. SolrServers are in "org.apache.solr.client.solrj.xxx" packages, while tests inheriting from BaseDistributedSearchTestCase are (mostly) in "org.apache.solr" or "org.apache.solr.cloud". We could make them protected and then make a TestSolrServerWrapper (in package "org.apache.solr.client.solrj"), which makes them public, and then use TestSolrServerWrapper across tests (inheriting from BaseDistributedSearchTestCase). It is basically a question of changing "clients" and "downedClients" in BaseDistributedSearchTestCase to be a list of TestSolrServerWrapper instead of SolrServer. And do the same for commondCloudSolrServer.

          Is it ok for you to continue on the trunk patch from now? It's pros and cons. Maybe a branch_4x patch will attract more 4.1 users to try it out in their existing dev/test envs?

          I would prefer branch_4x, because

          • I see not problem just getting it in, because I am so sure it does not break anything - at least as long as you do not setup security, and in practice you where not able to before anyway.
          • Our Solr release is currently based on a Apache Solr 4.0 and we probably want to upgrade to be based on another 4.x before going to 5.x, and if this patch is in the 4.x version we upgrade to be based on, I will not have to deal with merging it in. Of course that is a personal argument

          But whatever the community can accept and a committer is willing to do, I will have to live with. Ordered preferences: branch_4x, trunk, security (the new branch), nowhere at all

          The alternative is to make one for branch_4x

          We already have a branch_4x patch - the one I provided!??!

          Show
          Per Steffensen added a comment - Hmm, perhaps make the new convenience methods protected to avoid public use and keep test code nice (if tests are in same pkg) They are not in the same package. SolrServers are in "org.apache.solr.client.solrj.xxx" packages, while tests inheriting from BaseDistributedSearchTestCase are (mostly) in "org.apache.solr" or "org.apache.solr.cloud". We could make them protected and then make a TestSolrServerWrapper (in package "org.apache.solr.client.solrj"), which makes them public, and then use TestSolrServerWrapper across tests (inheriting from BaseDistributedSearchTestCase). It is basically a question of changing "clients" and "downedClients" in BaseDistributedSearchTestCase to be a list of TestSolrServerWrapper instead of SolrServer. And do the same for commondCloudSolrServer. Is it ok for you to continue on the trunk patch from now? It's pros and cons. Maybe a branch_4x patch will attract more 4.1 users to try it out in their existing dev/test envs? I would prefer branch_4x, because I see not problem just getting it in, because I am so sure it does not break anything - at least as long as you do not setup security, and in practice you where not able to before anyway. Our Solr release is currently based on a Apache Solr 4.0 and we probably want to upgrade to be based on another 4.x before going to 5.x, and if this patch is in the 4.x version we upgrade to be based on, I will not have to deal with merging it in. Of course that is a personal argument But whatever the community can accept and a committer is willing to do, I will have to live with. Ordered preferences: branch_4x, trunk, security (the new branch), nowhere at all The alternative is to make one for branch_4x We already have a branch_4x patch - the one I provided!??!
          Hide
          Mark Miller added a comment -

          What do others say?

          Not much about the branch or what one to target - but I would like to see support from other committers for getting into handling security like this within Solr - it's something all devs will be saddled with in the future. We have kept security related things at the container and user level for a reason in the past - moving from that stance should require a lot of buy in.

          Also, regarding the security policy - that's a no go for me. Even setting it in ant is not great - I don't want to have to pass a security policy to run my tests in eclipse.

          Show
          Mark Miller added a comment - What do others say? Not much about the branch or what one to target - but I would like to see support from other committers for getting into handling security like this within Solr - it's something all devs will be saddled with in the future. We have kept security related things at the container and user level for a reason in the past - moving from that stance should require a lot of buy in. Also, regarding the security policy - that's a no go for me. Even setting it in ant is not great - I don't want to have to pass a security policy to run my tests in eclipse.
          Hide
          Per Steffensen added a comment -

          We have kept security related things at the container and user level for a reason in the past - moving from that stance should require a lot of buy in.

          Security enforcement is still supposed to be handled by the container - that is standardized and proven technology, so of course we will not change that. But as it is today in practice you cannot configure the container to enforce security, and still have a working system - at least not a distributed/cloud system. Most organizations using distributed/cloud Solr for anything serious would want to configure the container to enforce security. This will require that Solr-nodes are able to provide credentials in requests they send to other Solr-nodes. The container is not and will never be (unless you really twist things) involved in requests going out of Solr-nodes. Providing support for having credentials in those requests going out of Solr-nodes and into other Solr-nodes is what this is all about.

          Also, regarding the security policy - that's a no go for me. Even setting it in ant is not great - I don't want to have to pass a security policy to run my tests in eclipse

          I agree, but we can deal with that. Just change the realm used by the test-suite. I will be happy to do that.

          Show
          Per Steffensen added a comment - We have kept security related things at the container and user level for a reason in the past - moving from that stance should require a lot of buy in. Security enforcement is still supposed to be handled by the container - that is standardized and proven technology, so of course we will not change that. But as it is today in practice you cannot configure the container to enforce security, and still have a working system - at least not a distributed/cloud system. Most organizations using distributed/cloud Solr for anything serious would want to configure the container to enforce security. This will require that Solr-nodes are able to provide credentials in requests they send to other Solr-nodes. The container is not and will never be (unless you really twist things) involved in requests going out of Solr-nodes. Providing support for having credentials in those requests going out of Solr-nodes and into other Solr-nodes is what this is all about. Also, regarding the security policy - that's a no go for me. Even setting it in ant is not great - I don't want to have to pass a security policy to run my tests in eclipse I agree, but we can deal with that. Just change the realm used by the test-suite. I will be happy to do that.
          Hide
          Per Steffensen added a comment -

          New patch SOLR-4470_branch_4x_r1454444.patch

          • Fitting on top of branch_4x revision 1454444
          • Modifying java policy no longer necessary to run the test-suite
          • Public interface of SolrServer is not changed compared to before SOLR-4470 - still adding new convenience-methods including AuthCredentials as param, but they are protected, and made available for testing-only through subclasses in TestSolrServers. Tests are now coded against those test-SolrServers instead of directly against the "real" SolrServers. It might be a good thing in general to have a layer of test-SolrServers, a layer where "test"-mods can be made.
          patch -p0 -i <path>/SOLR-4470_branch_4x_r1454444.patch
          svn stat | grep "^?" | awk '{print $2}' | xargs svn add
          svn stat | grep "^A" | awk '{print $2}' | grep "\.java$" | xargs svn propset svn:eol-style native
          ant precommit
          
          Show
          Per Steffensen added a comment - New patch SOLR-4470 _branch_4x_r1454444.patch Fitting on top of branch_4x revision 1454444 Modifying java policy no longer necessary to run the test-suite Public interface of SolrServer is not changed compared to before SOLR-4470 - still adding new convenience-methods including AuthCredentials as param, but they are protected, and made available for testing-only through subclasses in TestSolrServers. Tests are now coded against those test-SolrServers instead of directly against the "real" SolrServers. It might be a good thing in general to have a layer of test-SolrServers, a layer where "test"-mods can be made. patch -p0 -i <path>/SOLR-4470_branch_4x_r1454444.patch svn stat | grep "^?" | awk '{print $2}' | xargs svn add svn stat | grep "^A" | awk '{print $2}' | grep "\.java$" | xargs svn propset svn:eol-style native ant precommit
          Hide
          Tim Vaillancourt added a comment - - edited

          I'm running the "SOLR-4470_branch_4x_r1454444.patch" now, and can say it's working as expected - all my smoke tests are passing and Collections API CREATE/DELETEs are applying Authorization headers flawlessly. Thanks for your work on this patch!!

          I'm very excited for this patch to make it to mainline. I will say that passwords in JVM command-lines frighten me, but it is far better than no auth , and I think this is the best approach for now.

          Show
          Tim Vaillancourt added a comment - - edited I'm running the " SOLR-4470 _branch_4x_r1454444.patch" now, and can say it's working as expected - all my smoke tests are passing and Collections API CREATE/DELETEs are applying Authorization headers flawlessly. Thanks for your work on this patch!! I'm very excited for this patch to make it to mainline. I will say that passwords in JVM command-lines frighten me, but it is far better than no auth , and I think this is the best approach for now.
          Hide
          Tim Vaillancourt added a comment - - edited

          So after some more testing, I've noticed that the Authorization headers get applied to inter-node requests ONLY when these JVM flags are added:

          "-DinternalAuthCredentialsBasicAuthUsername=<username> -DinternalAuthCredentialsBasicAuthPassword=<password>"

          Without those two JVM flags the Authorization headers ARE NOT passed along on inter-node calls. I noticed in one of Per's original posts he mentioned a two-sided behavior where the headers are provided from the JVM properties (the method that works for me), but also a behavior where they would be passed down from the original Collections API call, an approach I personally like much more. Is the later approach I mentioned working, or just not in this patch? It currently does not work for me.

          To reduce changes, simplify the problem and keep more in the container-level, could we introduce a Limitation/Caveat that all SolrCloud nodes MUST HAVE the same basic-auth password, and rely solely on the Collections API request to provide the Authorization header (to be used on subrequests), vs having the user/password in the JVM properties/app level at all?

          Keep in mind that the Solr Web interface and os-level "ps" commands will display the entire JVM command line, in this case with password plain text in JVM command. To me Solr solely relying on the external call to provide the credentials it is more secure, simpler and keeps Solr less involved in security above it.

          Long story short: could the ONLY approach be take the Authorization header from the external request, applying it to subsequent inter-node ones, making this a simpler and possibly more secure change?

          Show
          Tim Vaillancourt added a comment - - edited So after some more testing, I've noticed that the Authorization headers get applied to inter-node requests ONLY when these JVM flags are added: "-DinternalAuthCredentialsBasicAuthUsername=<username> -DinternalAuthCredentialsBasicAuthPassword=<password>" Without those two JVM flags the Authorization headers ARE NOT passed along on inter-node calls. I noticed in one of Per's original posts he mentioned a two-sided behavior where the headers are provided from the JVM properties (the method that works for me), but also a behavior where they would be passed down from the original Collections API call, an approach I personally like much more. Is the later approach I mentioned working, or just not in this patch? It currently does not work for me. To reduce changes, simplify the problem and keep more in the container-level, could we introduce a Limitation/Caveat that all SolrCloud nodes MUST HAVE the same basic-auth password, and rely solely on the Collections API request to provide the Authorization header (to be used on subrequests), vs having the user/password in the JVM properties/app level at all? Keep in mind that the Solr Web interface and os-level "ps" commands will display the entire JVM command line, in this case with password plain text in JVM command. To me Solr solely relying on the external call to provide the credentials it is more secure, simpler and keeps Solr less involved in security above it. Long story short: could the ONLY approach be take the Authorization header from the external request, applying it to subsequent inter-node ones, making this a simpler and possibly more secure change?
          Hide
          Tim Vaillancourt added a comment -

          Scratch my last suggestion, I see the conditions now where credentials are needed but not provided by a super request. In my case I am only restricting /admin/* which I believe is only used by super requests, however.

          I think a credentials in property file would resolve my concerns about the credentials appearing in JVM command line. I'll see if I can get that to work.

          Show
          Tim Vaillancourt added a comment - Scratch my last suggestion, I see the conditions now where credentials are needed but not provided by a super request. In my case I am only restricting /admin/* which I believe is only used by super requests, however. I think a credentials in property file would resolve my concerns about the credentials appearing in JVM command line. I'll see if I can get that to work.
          Hide
          Shawn Heisey added a comment -

          +1 on getting this into Solr. I personally don't need this, but I see lots of help requests on the mailing list and IRC channel asking "how can I secure Solr?" The current general answer (4.2.1) is that they can't. Exceptions are:

          1) If the only inter-node communication they have enabled is replication, which has supported basic auth since 1.4.0. There is some anecdotal evidence that basic auth for replication may be broken in 4.2 and/or 4.2.1.

          2) Putting an authenticating proxy in front of Solr so it's the only way for clients to access it.

          Show
          Shawn Heisey added a comment - +1 on getting this into Solr. I personally don't need this, but I see lots of help requests on the mailing list and IRC channel asking "how can I secure Solr?" The current general answer (4.2.1) is that they can't. Exceptions are: 1) If the only inter-node communication they have enabled is replication, which has supported basic auth since 1.4.0. There is some anecdotal evidence that basic auth for replication may be broken in 4.2 and/or 4.2.1. 2) Putting an authenticating proxy in front of Solr so it's the only way for clients to access it.
          Hide
          Alexey Serba added a comment -

          Acording to HTTP-URL syntax, you can give the basic auth params using http://user:password@host:port/. As Solr is using Commons-Httpsclient for connections, wouldnt this work automatically if you configure the node's HTTP adress using this syntax?

          +1, I like this solution for generic http basic auth support in Solr/Cloud

          Besides that we plan to (later) go for HTTPS in order to encrypt the clear text ... username/password in the requests, and I believe that the URL itself is not being encrypted in HTTPS.

          I don't think this is the case. And "user:password" part won't be transmitted in a url, but will be parsed by httpclient and submitted as a proper HTTP headers.

          Show
          Alexey Serba added a comment - Acording to HTTP-URL syntax, you can give the basic auth params using http://user:password@host:port/ . As Solr is using Commons-Httpsclient for connections, wouldnt this work automatically if you configure the node's HTTP adress using this syntax? +1, I like this solution for generic http basic auth support in Solr/Cloud Besides that we plan to (later) go for HTTPS in order to encrypt the clear text ... username/password in the requests, and I believe that the URL itself is not being encrypted in HTTPS. I don't think this is the case. And "user:password" part won't be transmitted in a url, but will be parsed by httpclient and submitted as a proper HTTP headers.
          Hide
          Jan Høydahl added a comment -

          Now that solr.xml is refurbished and probably going to land in ZK, and we have an option for local filesystem based solr.properties (SOLR-4662), how about allowing auth user/pass to live in solr.xml/solr.properties and optionally override with -D flags?

          If you're using this patch in test/prod, please report any known issues, so we can start the push towards getting this committed.

          Show
          Jan Høydahl added a comment - Now that solr.xml is refurbished and probably going to land in ZK, and we have an option for local filesystem based solr.properties ( SOLR-4662 ), how about allowing auth user/pass to live in solr.xml/solr.properties and optionally override with -D flags? If you're using this patch in test/prod, please report any known issues, so we can start the push towards getting this committed.
          Hide
          Tim Vaillancourt added a comment -

          "how about allowing auth user/pass to live in solr.xml/solr.properties and optionally override with -D flags"
          ^^ exactly what I'm looking for.

          Aside from my passwords-in-commandline concern, I was able to use the patch without issue at the revision it was made for, although that revision is far in the past now.

          Tim

          Show
          Tim Vaillancourt added a comment - "how about allowing auth user/pass to live in solr.xml/solr.properties and optionally override with -D flags" ^^ exactly what I'm looking for. Aside from my passwords-in-commandline concern, I was able to use the patch without issue at the revision it was made for, although that revision is far in the past now. Tim
          Hide
          Sindre Fiskaa added a comment -

          Have tested the patch and works out of box! This is a very important feature for securing nodes running in replicated environments. Now, only thing missing is support for SSL..

          When will this patch be released?

          Sindre

          Show
          Sindre Fiskaa added a comment - Have tested the patch and works out of box! This is a very important feature for securing nodes running in replicated environments. Now, only thing missing is support for SSL.. When will this patch be released? Sindre
          Hide
          Per Steffensen added a comment -

          Sorry, I didnt follow the discussions on this issue, so that I could have provided comments sooner. But I kinda gave up on ever getting "more than 10 lines"-patches committed. I would like to help the community though - its not your fault.

          I'm very excited for this patch to make it to mainline.
          When will this patch be released?

          Glad to hear that someone out there is using the patch successfully. I would not expect it to get committed soon - maybe never

          I will say that passwords in JVM command-lines frighten me, but it is far better than no auth , and I think this is the best approach for now.
          Keep in mind that the Solr Web interface and os-level "ps" commands will display the entire JVM command line, in this case with password plain text in JVM command.
          I think a credentials in property file would resolve my concerns about the credentials appearing in JVM command line. I'll see if I can get that to work.
          Aside from my passwords-in-commandline concern, I was able to use the patch without issue at the revision it was made for, although that revision is far in the past now.

          The solution provided using JVM parameter to pass credentials is not optimal. But this is what I thought is generic enough to provide to the community. At my place we are dealing this this issue though. Clear-text credentials in files are almost as bad as clear-text credentials in commandline (viewable through ps-like command), so that was not our solution. We pipe credentials into Solr using stdin. Then we have a bean in jetty (loaded very early in the startup sequence) loading credentials from stdin and setting JVM parameters accordingly. Let me know if you want me to share some code - it very easy to do.

          You can protect the Solr Web interface, to that you will have to know the credentials in order to get access to read them

          how about allowing auth user/pass to live in solr.xml/solr.properties and optionally override with -D flags

          This is certainly worth considering. It was in the original thought for the solution, but it was scoped out in this first go, due to limiting the extend/size of the patch.

          If you're using this patch in test/prod, please report any known issues, so we can start the push towards getting this committed.

          FWIW we are using it in production - no problems.

          Show
          Per Steffensen added a comment - Sorry, I didnt follow the discussions on this issue, so that I could have provided comments sooner. But I kinda gave up on ever getting "more than 10 lines"-patches committed. I would like to help the community though - its not your fault. I'm very excited for this patch to make it to mainline. When will this patch be released? Glad to hear that someone out there is using the patch successfully. I would not expect it to get committed soon - maybe never I will say that passwords in JVM command-lines frighten me, but it is far better than no auth , and I think this is the best approach for now. Keep in mind that the Solr Web interface and os-level "ps" commands will display the entire JVM command line, in this case with password plain text in JVM command. I think a credentials in property file would resolve my concerns about the credentials appearing in JVM command line. I'll see if I can get that to work. Aside from my passwords-in-commandline concern, I was able to use the patch without issue at the revision it was made for, although that revision is far in the past now. The solution provided using JVM parameter to pass credentials is not optimal. But this is what I thought is generic enough to provide to the community. At my place we are dealing this this issue though. Clear-text credentials in files are almost as bad as clear-text credentials in commandline (viewable through ps-like command), so that was not our solution. We pipe credentials into Solr using stdin. Then we have a bean in jetty (loaded very early in the startup sequence) loading credentials from stdin and setting JVM parameters accordingly. Let me know if you want me to share some code - it very easy to do. You can protect the Solr Web interface, to that you will have to know the credentials in order to get access to read them how about allowing auth user/pass to live in solr.xml/solr.properties and optionally override with -D flags This is certainly worth considering. It was in the original thought for the solution, but it was scoped out in this first go, due to limiting the extend/size of the patch. If you're using this patch in test/prod, please report any known issues, so we can start the push towards getting this committed. FWIW we are using it in production - no problems.
          Hide
          Per Steffensen added a comment -

          BTW, if you put credentials in ZooKeeper (solrconfig.xml or something) you probably also want to protect ZooKeeper content - SOLR-4580

          Show
          Per Steffensen added a comment - BTW, if you put credentials in ZooKeeper (solrconfig.xml or something) you probably also want to protect ZooKeeper content - SOLR-4580
          Hide
          Jan Høydahl added a comment -

          This patch is frequently requested, has been reported as working flawlessly by several users, and looks solid.
          I'm starting now the process to get it into the code base. Here is my plan:

          1. Port to trunk and commit
          2. Let it bake for a few days, fixing any bugs that might pop up
          3. Let it stay in trunk yet some time to gain more usage and feedback
          4. Backport to 4.x
          Show
          Jan Høydahl added a comment - This patch is frequently requested, has been reported as working flawlessly by several users, and looks solid. I'm starting now the process to get it into the code base. Here is my plan: Port to trunk and commit Let it bake for a few days, fixing any bugs that might pop up Let it stay in trunk yet some time to gain more usage and feedback Backport to 4.x
          Hide
          Per Steffensen added a comment -

          Way to go, Jan!

          Show
          Per Steffensen added a comment - Way to go, Jan!
          Hide
          Per Steffensen added a comment -

          BTW, if this gets committed, we want to go remove all the "if SOLR-4470"-stuff on http://wiki.apache.org/solr/SolrSecurity, and just leave the documentation as "that is the way it is"

          Show
          Per Steffensen added a comment - BTW, if this gets committed, we want to go remove all the "if SOLR-4470 "-stuff on http://wiki.apache.org/solr/SolrSecurity , and just leave the documentation as "that is the way it is"
          Hide
          Alexey Serba added a comment -

          We would like to aim at a solution where "original" credentials are "forwarded" when a request directly/synchronously trigger a subrequest

          I'm wondering if all the people who reported that they are using this patch/feature need that "forwarding of original credentials with a subrequests"? What are the common use cases for this?

          Show
          Alexey Serba added a comment - We would like to aim at a solution where "original" credentials are "forwarded" when a request directly/synchronously trigger a subrequest I'm wondering if all the people who reported that they are using this patch/feature need that "forwarding of original credentials with a subrequests"? What are the common use cases for this?
          Hide
          Per Steffensen added a comment -

          I'm wondering if all the people who reported that they are using this patch/feature need that "forwarding of original credentials with a subrequests"? What are the common use cases for this?

          The reasoning is that, in general you do not want a user to be able to have something done (by the subrequest) for him, that we was not allowed to do directly himself - forwarding credentials will ensure that. I do not know if that is what "everybody" wants, but to me it seems like a nice default behaviour. This is only current/default behaviour. The code if nicely structured so that it is prepared for pluggable "subrequest credentials strategy". You need to have InterSolrNodeAuthCredentialsFactory.setCurrentSubRequestFactory called to change the strategy/behaviour. It is currently not possible to change the strategy from "the outside" through configuration, but it can easily be exposed - because the code it prepared for it.

          If you are willing to hack a little, I guess it is possible to add you own little servlet "MyStrategyChangingServlet" (to web.xml), make sure to add <load-on-startup> and make a call to InterSolrNodeAuthCredentialsFactory.setCurrentSubRequestFactory(SubRequestFactory) in MyStrategyChangingServlet.init() If you want the behaviour to be "just always use a fixed set of credentials", you can find inspiration on how to code your special SubRequestFactory, by looking at how the InterSolrNodeAuthCredentialsFactory.DefaultInternalRequestFactory works - that is the default implementation for getting credentials for inter-solr-requests that is NOT a direct/synchronous consequence of "super"-request from "the outside".

          Show
          Per Steffensen added a comment - I'm wondering if all the people who reported that they are using this patch/feature need that "forwarding of original credentials with a subrequests"? What are the common use cases for this? The reasoning is that, in general you do not want a user to be able to have something done (by the subrequest) for him, that we was not allowed to do directly himself - forwarding credentials will ensure that. I do not know if that is what "everybody" wants, but to me it seems like a nice default behaviour. This is only current/default behaviour. The code if nicely structured so that it is prepared for pluggable "subrequest credentials strategy". You need to have InterSolrNodeAuthCredentialsFactory.setCurrentSubRequestFactory called to change the strategy/behaviour. It is currently not possible to change the strategy from "the outside" through configuration, but it can easily be exposed - because the code it prepared for it. If you are willing to hack a little, I guess it is possible to add you own little servlet "MyStrategyChangingServlet" (to web.xml), make sure to add <load-on-startup> and make a call to InterSolrNodeAuthCredentialsFactory.setCurrentSubRequestFactory(SubRequestFactory) in MyStrategyChangingServlet.init() If you want the behaviour to be "just always use a fixed set of credentials", you can find inspiration on how to code your special SubRequestFactory, by looking at how the InterSolrNodeAuthCredentialsFactory.DefaultInternalRequestFactory works - that is the default implementation for getting credentials for inter-solr-requests that is NOT a direct/synchronous consequence of "super"-request from "the outside".
          Hide
          Mark Miller added a comment -

          I'm starting now the process to get it into the code base.

          I'm far from sold on this issue. It adds a lot of overhead for future solrcloud work, it ties Solr further to a webapp and jetty when we are trying to move away from those ties, and it puts us on the line for security, something the project has stayed out of.

          Before this got committed, I'd ask for a vote at minimum.

          Show
          Mark Miller added a comment - I'm starting now the process to get it into the code base. I'm far from sold on this issue. It adds a lot of overhead for future solrcloud work, it ties Solr further to a webapp and jetty when we are trying to move away from those ties, and it puts us on the line for security, something the project has stayed out of. Before this got committed, I'd ask for a vote at minimum.
          Hide
          Jan Høydahl added a comment -

          Before SolrCloud, people could (and did) deploy a Solr cluster using distributed search across HTTPS and auth by simply configuring the servlet container and let the application handle the requests to each node (perhaps except for the &shards= parameter). But with SolrCloud, Solr needs to be aware of this.

          It adds a lot of overhead for future solrcloud work

          This patch provides a clean auth solution which is back-compat in that you need do nothing if you won't use auth. Tests involving distributed/Cloud are indeed modified to pass the auth token, we talk about an overhead of one extra argument or code line per the request.

          it ties Solr further to a webapp and jetty

          I cannot see that this patch ties us any more to jetty. It will work just as well for Tomcat or Resin - it is Solr's test framework which is tied to Jetty, and this patch does not change that fact. If we continue to use Jetty in the post-WAR world this solution will likely work well; if we choose another framework, we need to modify things anyway.

          We shold not defer useful, solid features like this to a distant Solr-as-non-webapp release! To borrow from Oracle: "Move Solr forward!"

          Show
          Jan Høydahl added a comment - Before SolrCloud, people could (and did) deploy a Solr cluster using distributed search across HTTPS and auth by simply configuring the servlet container and let the application handle the requests to each node (perhaps except for the &shards= parameter). But with SolrCloud, Solr needs to be aware of this. It adds a lot of overhead for future solrcloud work This patch provides a clean auth solution which is back-compat in that you need do nothing if you won't use auth. Tests involving distributed/Cloud are indeed modified to pass the auth token, we talk about an overhead of one extra argument or code line per the request. it ties Solr further to a webapp and jetty I cannot see that this patch ties us any more to jetty. It will work just as well for Tomcat or Resin - it is Solr's test framework which is tied to Jetty, and this patch does not change that fact. If we continue to use Jetty in the post-WAR world this solution will likely work well; if we choose another framework, we need to modify things anyway. We shold not defer useful, solid features like this to a distant Solr-as-non-webapp release! To borrow from Oracle: "Move Solr forward!"
          Hide
          Mark Miller added a comment -

          auth by simply configuring the servlet container

          Right - nothing to do with Solr code.

          But with SolrCloud, Solr needs to be aware of this.

          Right - now we are talking about Solr code - my point exactly.

          we talk about an overhead of one extra argument or code line per the request.

          No, I don't think that is the only overhead at all in terms of further SolrCloud developement.

          We shold not defer useful, solid features like this to a distant Solr-as-non-webapp release!

          It depends on the feature - to me, I'm not so sure this feature worth putting Solr code into the security game. I think it should be handled above Solr.

          My current inclination would be to veto the code change unless other Solr (especially those that work on SolrCloud) developers show support for getting into the security game. Solr has explicity avoided it for a reason - it changes what we are responsible for. It's no longer leaving things up to the container and above - it puts developers on the line.

          "Move Solr forward!"

          I'm not in agreemeant that its a big step forward, and I'm not convinced it's worth pulling security into Solr code.

          Show
          Mark Miller added a comment - auth by simply configuring the servlet container Right - nothing to do with Solr code. But with SolrCloud, Solr needs to be aware of this. Right - now we are talking about Solr code - my point exactly. we talk about an overhead of one extra argument or code line per the request. No, I don't think that is the only overhead at all in terms of further SolrCloud developement. We shold not defer useful, solid features like this to a distant Solr-as-non-webapp release! It depends on the feature - to me, I'm not so sure this feature worth putting Solr code into the security game. I think it should be handled above Solr. My current inclination would be to veto the code change unless other Solr (especially those that work on SolrCloud) developers show support for getting into the security game. Solr has explicity avoided it for a reason - it changes what we are responsible for. It's no longer leaving things up to the container and above - it puts developers on the line. "Move Solr forward!" I'm not in agreemeant that its a big step forward, and I'm not convinced it's worth pulling security into Solr code.
          Hide
          Per Steffensen added a comment - - edited

          Thanks for taking the time to actually understand what this is and is not about, Jan. You clearly did!

          In general I do not want to argue too much with you anymore, Mark. You won, but IMHO the community and the project lost. My company and I will act on this "fact", but that is my and my colleagues' problem.

          I will try to correct when the truth is not told, though.

          it ties Solr further to a webapp and jetty when we are trying to move away from those ties

          The solution adds a single line that ties Solr further to webapp - the single line in SolrRequestParsers.parse using createAuthCredentialsFromServletRequest to get the super-request credentials from the javax.servlet.ServletRequest. If some day you move away from being a webapp and use another framework (or your own code) to deal with requests coming from the outside, you will need to extract the credentials in another way. But there is already a million things that need to change in such case - in SolrDispatchFilter, SolrRequestParsers etc. In no other way it ties Solr further to webapp. Everything from that point on is dealt with in Solr code - it is not much and it is nicely isolated.

          In no way it ties further to Jetty. Tomcat or Resin or Glassfish or ... will be able to run this code successfully.

          it puts us on the line for security, something the project has stayed out of ... I think it should be handled above Solr

          That is of course an argument. Personally I do not understand why a serious project would stay out of security, and I do really do not see how security wrt outgoing request can be handled outside Solr itself - in a truly secure way.

          Show
          Per Steffensen added a comment - - edited Thanks for taking the time to actually understand what this is and is not about, Jan. You clearly did! In general I do not want to argue too much with you anymore, Mark. You won, but IMHO the community and the project lost. My company and I will act on this "fact", but that is my and my colleagues' problem. I will try to correct when the truth is not told, though. it ties Solr further to a webapp and jetty when we are trying to move away from those ties The solution adds a single line that ties Solr further to webapp - the single line in SolrRequestParsers.parse using createAuthCredentialsFromServletRequest to get the super-request credentials from the javax.servlet.ServletRequest. If some day you move away from being a webapp and use another framework (or your own code) to deal with requests coming from the outside, you will need to extract the credentials in another way. But there is already a million things that need to change in such case - in SolrDispatchFilter, SolrRequestParsers etc. In no other way it ties Solr further to webapp. Everything from that point on is dealt with in Solr code - it is not much and it is nicely isolated. In no way it ties further to Jetty. Tomcat or Resin or Glassfish or ... will be able to run this code successfully. it puts us on the line for security, something the project has stayed out of ... I think it should be handled above Solr That is of course an argument. Personally I do not understand why a serious project would stay out of security, and I do really do not see how security wrt outgoing request can be handled outside Solr itself - in a truly secure way.
          Hide
          Robert Muir added a comment -

          That is of course an argument. Personally I do not understand why a serious project would stay out of security, and I do really do not see how security wrt outgoing request can be handled outside Solr itself - in a truly secure way.

          Staying out of security has nothing to do with being serious or not serious. I've advocated keeping security out of the search engine my entire career. I will do the same here.

          Providing security features is not just something you do, like adding any other feature. You need to have people with a real security background who know what the fuck they are doing to ensure correctness. You need to deal with the inevitable security vulnerabilities and fixes to those. I don't think this is something our PMC should waste its time with. We should make search engines instad.

          So when issues come up about security-related things in lucene/solr that can be done elsewhere outside of it in a more secure way instead (e.g. encrypting directories and so on), you can expect me to push back too.

          Show
          Robert Muir added a comment - That is of course an argument. Personally I do not understand why a serious project would stay out of security, and I do really do not see how security wrt outgoing request can be handled outside Solr itself - in a truly secure way. Staying out of security has nothing to do with being serious or not serious. I've advocated keeping security out of the search engine my entire career. I will do the same here. Providing security features is not just something you do, like adding any other feature. You need to have people with a real security background who know what the fuck they are doing to ensure correctness. You need to deal with the inevitable security vulnerabilities and fixes to those. I don't think this is something our PMC should waste its time with. We should make search engines instad. So when issues come up about security-related things in lucene/solr that can be done elsewhere outside of it in a more secure way instead (e.g. encrypting directories and so on), you can expect me to push back too.
          Hide
          Jan Høydahl added a comment -

          I have also been on the "Stay out of security" line all the time, and I think that's one of the benefits with being hosted in a servlet container - folks know how to setup auth/ssl and it will also work with Solr. And if there's a security hole then they must fix it

          However, noone have constructively suggested how to make these auth/ssl setups work with SolrCloud without changes in Solr.

          We can tell large users to deploy their search cluster on a separate secure network behind firewalls. But for every large customer there are 50 medium ones, with 2 or 3 Solr nodes in the same network as their Sharepoint or whatever. The only way they can lock things down is setting up firewall rules on every Solr node, allowing traffic from frontend IPs and the IPs of the other Solr nodes. This is a partial workaround but you can still not use basic auth or encryption, and communication would still be open for packet sniffing, something which is a no-starter for healthcare and other sensitive data.

          That's why I have become +1 on supporting this specific use case and even put some work in maintaining it.

          Show
          Jan Høydahl added a comment - I have also been on the "Stay out of security" line all the time, and I think that's one of the benefits with being hosted in a servlet container - folks know how to setup auth/ssl and it will also work with Solr. And if there's a security hole then they must fix it However, noone have constructively suggested how to make these auth/ssl setups work with SolrCloud without changes in Solr. We can tell large users to deploy their search cluster on a separate secure network behind firewalls. But for every large customer there are 50 medium ones, with 2 or 3 Solr nodes in the same network as their Sharepoint or whatever. The only way they can lock things down is setting up firewall rules on every Solr node, allowing traffic from frontend IPs and the IPs of the other Solr nodes. This is a partial workaround but you can still not use basic auth or encryption, and communication would still be open for packet sniffing, something which is a no-starter for healthcare and other sensitive data. That's why I have become +1 on supporting this specific use case and even put some work in maintaining it.
          Hide
          Jan Høydahl added a comment -

          We could mark it experimental in the beginning if we want to give it a soft launch for risk-willing users only. I know of at least 2 companies both capable and willing to test, report bugs and submit patches.

          I'll prepare a VOTE thread on the dev list for a) Should SolrCloud support auth/https and b) is SOLR-4470 a viable solution

          Show
          Jan Høydahl added a comment - We could mark it experimental in the beginning if we want to give it a soft launch for risk-willing users only. I know of at least 2 companies both capable and willing to test, report bugs and submit patches. I'll prepare a VOTE thread on the dev list for a) Should SolrCloud support auth/https and b) is SOLR-4470 a viable solution
          Hide
          Robert Muir added a comment -

          This is a partial workaround but you can still not use basic auth or encryption, and communication would still be open for packet sniffing, something which is a no-starter for healthcare and other sensitive data.

          Doesn't mean solr needs to deal with it: we could recommend ipsec instead for example. Just like recommending that someone worried about plain-text stuff on their disk can use disk encryption in their OS instead of asking us to write something in lucene.

          Show
          Robert Muir added a comment - This is a partial workaround but you can still not use basic auth or encryption, and communication would still be open for packet sniffing, something which is a no-starter for healthcare and other sensitive data. Doesn't mean solr needs to deal with it: we could recommend ipsec instead for example. Just like recommending that someone worried about plain-text stuff on their disk can use disk encryption in their OS instead of asking us to write something in lucene.
          Hide
          Alexey Serba added a comment - - edited

          The reasoning is that, in general you do not want a user to be able to have something done (by the subrequest) for him, that we was not allowed to do directly himself - forwarding credentials will ensure that.

          Per Steffensen, but this patch currently only does authn and not authz part I haven't seen examples how Solr users should implement per-collection or per-document security. That's why I don't understand how someone would use "forwarding credentials" feature if Solr does not provide any way (best practices, recipes, whatever) to enforce authz policies / security.

          How do you do that in your application? How do you specify who can do what? Where do you enforce that - in custom UpdateProcessor, SearchComponent, SolrDispathFilter?

          Show
          Alexey Serba added a comment - - edited The reasoning is that, in general you do not want a user to be able to have something done (by the subrequest) for him, that we was not allowed to do directly himself - forwarding credentials will ensure that. Per Steffensen , but this patch currently only does authn and not authz part I haven't seen examples how Solr users should implement per-collection or per-document security. That's why I don't understand how someone would use "forwarding credentials" feature if Solr does not provide any way (best practices, recipes, whatever) to enforce authz policies / security. How do you do that in your application? How do you specify who can do what? Where do you enforce that - in custom UpdateProcessor, SearchComponent, SolrDispathFilter?
          Hide
          Jan Høydahl added a comment -

          Doesn't mean solr needs to deal with it: we could recommend ipsec instead for example

          But we can choose to deal with it, to enable broader use of the product.
          I've only used ipsec for VPN stuff. Can ipsec realistically solve this in a small SolrCloud cluster? Is there a software product that can be installed on frontends and Solr servers to set it up transparently to Solr/SolrJ?

          Show
          Jan Høydahl added a comment - Doesn't mean solr needs to deal with it: we could recommend ipsec instead for example But we can choose to deal with it, to enable broader use of the product. I've only used ipsec for VPN stuff. Can ipsec realistically solve this in a small SolrCloud cluster? Is there a software product that can be installed on frontends and Solr servers to set it up transparently to Solr/SolrJ?
          Hide
          Robert Muir added a comment -

          it works: see your OS documentation on how to configure it.

          Show
          Robert Muir added a comment - it works: see your OS documentation on how to configure it.
          Hide
          Mark Miller added a comment -

          I'll prepare a VOTE thread on the dev list for a) Should SolrCloud support auth/https

          I don't think that's really the question - supporting basic auth and https primarily at the container level is much less contraversial. If devs don't really have to consider that stuff (and right now you really don't), that's one things. When devs are put on the line when it comes to users and security, thats another thing.

          It's the code creep into Solr to support this feature that concerns me. If we offered this feature without the creep into all this SolrCloud code, it would not be very contraversial. I realize the reasons there is the creep, I've read the comments, but I still see it as moving over a line we have purposely not crossed before.

          The real question is if we want to push down this security support road - where we advertise this support and developers are on the line for this support. It's both a fuzzy line and a slippery slope - but this issue seems to be moving us down a path I'm not so comforatable with when I consider any future dev I put into SolrCloud.

          SolrCloud is still at an early enough phase that I'm not really willing to spend a lot of time considering security as I add new features or refactor older code. Nor do I want to be on the line when some big company has a security breach due to my code changes. We get around this by saying Solr needs to be protected at a level above Solr - can you setup some basic auth at the container level and run most things over https? Sure, I suppose that's convienent for some - but we don't claim any security - we say it's up to you to sandbox Solr.

          Show
          Mark Miller added a comment - I'll prepare a VOTE thread on the dev list for a) Should SolrCloud support auth/https I don't think that's really the question - supporting basic auth and https primarily at the container level is much less contraversial. If devs don't really have to consider that stuff (and right now you really don't), that's one things. When devs are put on the line when it comes to users and security, thats another thing. It's the code creep into Solr to support this feature that concerns me. If we offered this feature without the creep into all this SolrCloud code, it would not be very contraversial. I realize the reasons there is the creep, I've read the comments, but I still see it as moving over a line we have purposely not crossed before. The real question is if we want to push down this security support road - where we advertise this support and developers are on the line for this support. It's both a fuzzy line and a slippery slope - but this issue seems to be moving us down a path I'm not so comforatable with when I consider any future dev I put into SolrCloud. SolrCloud is still at an early enough phase that I'm not really willing to spend a lot of time considering security as I add new features or refactor older code. Nor do I want to be on the line when some big company has a security breach due to my code changes. We get around this by saying Solr needs to be protected at a level above Solr - can you setup some basic auth at the container level and run most things over https? Sure, I suppose that's convienent for some - but we don't claim any security - we say it's up to you to sandbox Solr.
          Hide
          Alexey Serba added a comment -

          However, noone have constructively suggested how to make these auth/ssl setups work with SolrCloud without changes in Solr.

          I think ssl stuff should be working after recent http client upgrade and switch to SystemDefaultHttpClient. Now I believe you can set up your key and trust stores using standard Java properties and it should work.

          Basic auth could be implemented by crafting username and password information into shard HTTP urls as suggested by Uwe. It seems http client supports this feature.

          Show
          Alexey Serba added a comment - However, noone have constructively suggested how to make these auth/ssl setups work with SolrCloud without changes in Solr. I think ssl stuff should be working after recent http client upgrade and switch to SystemDefaultHttpClient. Now I believe you can set up your key and trust stores using standard Java properties and it should work. Basic auth could be implemented by crafting username and password information into shard HTTP urls as suggested by Uwe. It seems http client supports this feature.
          Hide
          Jan Høydahl added a comment -

          supporting basic auth and https primarily at the container level is much less contraversial. [...] Nor do I want to be on the line when some big company has a security breach due to my code changes

          Correct me if I'm wrong, but all handling of security on inbound requests to Solr is still handled fully by the container, even with this patch. I.e. no code that you add to SolrCloud will be able to open a hole for accepting incoming search requests that should not have been accepted. The user configures the realms, user/pass etc fully on the container level.

          With one exception, and that is the -DinternalAuthCredentialsBasicAuthPassword=<password> passed to Solr code, enabling system-initiated inter-node communication. If this is snapped up by foreigners, they potentially gain full access to Solr if they have physical network access. We should find a better way than passing this on the command-line. The ultimate would be to do inter-node requests using certificate based auth instead of passwords.

          If we offered this feature without the creep into all this SolrCloud code, it would not be very contraversial

          Suggestions for less creep welcome.

          I think ssl stuff should be working after recent http client upgrade and switch to SystemDefaultHttpClient. Now I believe you can set up your key and trust stores using standard Java properties and it should work.

          Interesting. Let's explore further... Per?

          Show
          Jan Høydahl added a comment - supporting basic auth and https primarily at the container level is much less contraversial. [...] Nor do I want to be on the line when some big company has a security breach due to my code changes Correct me if I'm wrong, but all handling of security on inbound requests to Solr is still handled fully by the container, even with this patch. I.e. no code that you add to SolrCloud will be able to open a hole for accepting incoming search requests that should not have been accepted. The user configures the realms, user/pass etc fully on the container level. With one exception, and that is the -DinternalAuthCredentialsBasicAuthPassword=<password> passed to Solr code, enabling system-initiated inter-node communication. If this is snapped up by foreigners, they potentially gain full access to Solr if they have physical network access. We should find a better way than passing this on the command-line. The ultimate would be to do inter-node requests using certificate based auth instead of passwords. If we offered this feature without the creep into all this SolrCloud code, it would not be very contraversial Suggestions for less creep welcome. I think ssl stuff should be working after recent http client upgrade and switch to SystemDefaultHttpClient. Now I believe you can set up your key and trust stores using standard Java properties and it should work. Interesting. Let's explore further... Per?
          Hide
          Mark Miller added a comment -

          Suggestions for less creep welcome.

          Thats more on the shoulders of those that want to get it in - it's not currently a feature I'm after, just one I'm very concered about going in and changing the game on Solr security support without support of the developer community.

          inbound requests to Solr is still handled fully by the container,

          If inbound requests are the only concern, lets drop all the intra node stuff that requires the Solr code changes and call this done

          Show
          Mark Miller added a comment - Suggestions for less creep welcome. Thats more on the shoulders of those that want to get it in - it's not currently a feature I'm after, just one I'm very concered about going in and changing the game on Solr security support without support of the developer community. inbound requests to Solr is still handled fully by the container, If inbound requests are the only concern, lets drop all the intra node stuff that requires the Solr code changes and call this done
          Hide
          Per Steffensen added a comment -

          Sorry to interrupt again, but Jan is clearly still that one that really understand what this is and is not about. It is a shame.

          In this discussion "security" seems to be just one thing. People talk about aspects of security that this issue has nothing to do with AT ALL.

          Security is about a lot of aspects - a.o.:

          • Authentication: Allowing the client to identify itself as being someone or something. And do it in a way so that you (the server-side) trust him. Examples if someone/something:
            • a) The one(s) knowing a specific set of username/password
            • b) The one(s) holding the private part of a certificate-pair (e.g. RSA)
            • c) The one(s) able to requests for a machine with a certain IP-address
          • Authorization: Basically a map from the "key" associated with your authentication to a set of things you are allowed to do. "Things you are allowed to do" can e.g. be "functions/operations you are allowed to do" or more fine-grained "data you are allowed to read or update or delete". For a) the "key" will be the username. For b) the "key" will be the public part of the certificate-pair (which you know in advance) and for c) the "key" will be the IP-address
          • Integrity: E.g. on transport-layer that data has not been changed on the way from when the client (the authenticated party) sent it and until the server receives it
          • Confidentiality: E.g. on transport-layer that data has not been read (and understood) on the way from when the client sent it and until the server receives it
          • Dealing with the aspects on storage-, application-, transport-, OS-, etc-levels

          Different kinds of technology claim to be able to guarantee different sets of those aspects. E.g. a webcontainer is required to be able to deal with authentication and authorization on application-layer and integrity and confidentiality on transport-layer - at least if it wants to be certified, because it is part of the spec that it needs to implement. The fact that a webcontainer is able to deal with those things for you is something people would like to advantage of - whether or not you guys are willing to accept it or not. At my company we want to do it, and as I understand it, Jan knows about at least two other others that also want to do it. I would claim that at least 90% of Solr users that want "to do security" would like to take advantage of the fact that a webcontainer can handle those things for you.

          Providing security features is not just something you do, like adding any other feature. You need to have people with a real security background who know what the fuck they are doing to ensure correctness. You need to deal with the inevitable security vulnerabilities and fixes to those. I don't think this is something our PMC should waste its time with.

          I agree. But this is NOT about Solr going into "security" in the way that "we handle/guarantee this and that kind of security aspect for you". That is still left to other technologies like e.g. the webcontainer. This issue is all about enabling a SolrCloud cluster to work, IF you (a Solr user) choose to have another technology enforce certain security aspects for you. If a Solr users sets up any kind of security technology that require ingoing traffic to a Solr-node to be authenticated (by http basic auth) an this is also enforced for Solr-node to Solr-node traffic the SolrCloud cluster will not work, and you cannot (easily) make it work in a secure way without Solr changes. If you use the webcontainer (running Solr) to enforce those security aspects it will be enforced for Solr-node to Solr-node traffic also.

          So when issues come up about security-related things in lucene/solr that can be done elsewhere outside of it in a more secure way instead (e.g. encrypting directories and so on), you can expect me to push back too.

          You might be able to do it "elsewhere" but it will certainly not be "more secure".
          Encrypting directories is something completely different. It deals with confidentiality aspects on storage-level. This issue has absolutely nothings to do with that.

          we could recommend ipsec instead for example

          IPsec is mainly about integrity and confidentiality on transport-level (IP-layer). That is a valid alternative to letting the webcontainer deal with integrity and confidentiality on transport-level (basically require HTTPs transport). Using IPsec for authentication and authorization is very complicated, and unless you really want to do major work, it is only able to deal with those aspects based on certificates. People want to use usernames and passwords in a lot of use-cases. You do not see facebook or twitter or ... wanting you to generate you own RSA-certificate-pairs and send them the public part of it. I know Solr has the philosophy that is it not supposed to be exposed directly - instead be exposed indirectly though some kind of gateway (where authentication and authorization wrt "outer users" can be enforced). But if you are fairly paranoid you do not necessarily want to trust those gateways (they might do bad things both intentionally and unintentionally), and therefore you will also likely want to set up security around you SolrCloud cluster itself. Activating the webcontainers (the ones running Solr) ability to do it for you is just an obvious way.

          That's why I don't understand how someone would use "forwarding credentials" feature if Solr does not provide any way (best practices, recipes, whatever) to enforce authz policies / security. How do you do that in your application? How do you specify who can do what? Where do you enforce that - in custom UpdateProcessor, SearchComponent, SolrDispathFilter?

          We use the webcontainers ability to enforce those aspects of security. For recipes I have added a lot to http://wiki.apache.org/solr/SolrSecurity - go read.

          To spell it out we do the following

          • Add to Solr web.xml AT THE VERY TOP
              <filter>
                <filter-name>RegExpAuthorizationFilter</filter-name>
                <filter-class>org.apache.solr.servlet.security.RegExpAuthorizationFilter</filter-class>
                <init-param>
                  <param-name>search-constraint</param-name>
                  <param-value>1|search-role,admin-role|^.*/select$</param-value>
                </init-param>
                <init-param>
                  <param-name>terms-constraint</param-name>
                  <param-value>2|search-role,admin-role|^.*/terms$</param-value>
                </init-param>
                <init-param>
                  <param-name>get-constraint</param-name>
                  <param-value>3|search-role,admin-role|^.*/get$</param-value>
                </init-param>
                <init-param>
                  <param-name>admin-constraint</param-name>
                  <param-value>4|admin-role|^.*$</param-value>
                </init-param>
              </filter>
            
          • Add to Solr web.xml (at the spot where it belongs)
              <security-constraint>
                <web-resource-collection>
                  <web-resource-name>All resources need authentication</web-resource-name>
                  <url-pattern>/*</url-pattern>
                </web-resource-collection>
                <auth-constraint>
                  <role-name>search-role</role-name>
                  <role-name>admin-role</role-name>
                </auth-constraint>
              </security-constraint>
            
              <login-config>
                <auth-method>BASIC</auth-method>
                <realm-name>My Realm</realm-name>
              </login-config>
            
          • Add to jetty.xml
                <Call name="addBean">
                  <Arg>
                    <New class="xxx.yyy.zzz.MyZKLoginService">
                      <Set name="name">My Realm</Set>
                    </New>
                  </Arg>
                </Call>
            

          This basically asks jetty to handle authentication and authorization for you - AT application-layer. See details on http://wiki.apache.org/solr/SolrSecurity about how it works and why it is done the way it is

          • Actually we only let the webcontainer deal with the authentication part. We want to do authorization based on URL-patterns, which a webcontianer is able to do. But due to limitations on the <url-pattern> and the way Solr-URLs are structured and our requirements to URL-based authorization (basically we want a "search-user" allowed to do searches only and an "admin-user" allowed to do anything), we need to deal with authorization in another way. We deal with URL-based authorization by adding the RegExpAuthorizationFilter filter in web.xml. It does URL-pattern authentication, just as the webcontainer itself is able to do for you, but this solution allowed reg-exp URL-patterns, enabling us to enforce the rules we want.
          • We use our own Realm, but you can use one of those that come out of the box with jetty - see http://wiki.apache.org/solr/SolrSecurity. Our realm is using data that we put in ZooKeeper. ZooKeeper has some properties that makes it a nice persistence layer for a realm. It is distributed (unlike local files) so it is easy to make sure that all Solr-nodes at any time authenticate and authorize with the same set of credentials and roles. It also has this nice push-thing (watchers) enabling us to do changes in the realm-data-foundation (in ZK) and have all realms (living in the webcontainers) be aware of the changes without having to go pull for changes all the time.

          supporting basic auth and https primarily at the container level is much less contraversial

          Agree. That is exactly why you want to enable a SolrCloud cluster to still work, if the Solr admin chooses to let the container enforce that kind of security

          SolrCloud is still at an early enough phase that I'm not really willing to spend a lot of time considering security as I add new features or refactor older code. Nor do I want to be on the line when some big company has a security breach due to my code changes.

          You will not have to deal with the big company. If the enforcement of security does not work, it is because the technology they use to enforce it does not work. Solr is not enforcing security - the webcontainer or something else is. This patch only introduces the ability in SolrCloud, that you can make it work if the Solr admin choose to let the container handle security for you.

          can you setup some basic auth at the container level

          No basically not. Not before this patch.

          ...and run most things over https?

          No, but that is another issue with Solr - or at least it was the last time I checked it. IPsec is a valid alternative, though.

          I think ssl stuff should be working after recent http client upgrade and switch to SystemDefaultHttpClient. Now I believe you can set up your key and trust stores using standard Java properties and it should work.

          Well you are kind of right, even though you mix up concepts a little. SSL (or HTTPS = HTTP over SSL) is about the transport-layer - nothing to do with this issue, SystemDefaultHttpClient or key- and trust-stores. But SSL uses certificate-pairs to do the encryption over the transport-layer. Those same certificate-pairs can be used for authentication, but it is another aspect and has nothing to do with SSL. But there is one big difference in how easy it is to use certificates for encrypted transport vs using it for authentication. To use it for authentication you need to pre-exchange the public parts of you certificates. To use it for encryption on the transport-layer you do not have to pre-exchange.

          SystemDefaultHttpClient enables us to do certificated based authentication, yes. But it requires setting up key- and trust-stores and pre-exchanging certificates. People want to use username/password based authentication in a lot of user-cases.

          Correct me if I'm wrong, but all handling of security on inbound requests to Solr is still handled fully by the container, even with this patch. I.e. no code that you add to SolrCloud will be able to open a hole for accepting incoming search requests that should not have been accepted. The user configures the realms, user/pass etc fully on the container level.

          Correct!

          With one exception, and that is the -DinternalAuthCredentialsBasicAuthPassword=<password> passed to Solr code, enabling system-initiated inter-node communication. If this is snapped up by foreigners, they potentially gain full access to Solr if they have physical network access. We should find a better way than passing this on the command-line

          Agree with you concern. The VM-param way of handing over the passwords to Solr is the easiest way, though. I wanted to limit the patch, so that is what is only directly supported for now. But the solution actually does enable you to do it a different way. You can override the CredentialsProviders or you can choose to use the default one (finding credentials in VM params) but not set the VM params using command-line. We do the later in my company - basically we pipe credentials in though stdin, have a small bean reading from stdin when the container starts and add whatever it reads as VM params. Voila, passwords not to be found in environment or on command-line (exposed by e.g. "ps -ef" or "ps eww").

          Show
          Per Steffensen added a comment - Sorry to interrupt again, but Jan is clearly still that one that really understand what this is and is not about. It is a shame. In this discussion "security" seems to be just one thing. People talk about aspects of security that this issue has nothing to do with AT ALL. Security is about a lot of aspects - a.o.: Authentication: Allowing the client to identify itself as being someone or something. And do it in a way so that you (the server-side) trust him. Examples if someone/something: a) The one(s) knowing a specific set of username/password b) The one(s) holding the private part of a certificate-pair (e.g. RSA) c) The one(s) able to requests for a machine with a certain IP-address Authorization: Basically a map from the "key" associated with your authentication to a set of things you are allowed to do. "Things you are allowed to do" can e.g. be "functions/operations you are allowed to do" or more fine-grained "data you are allowed to read or update or delete". For a) the "key" will be the username. For b) the "key" will be the public part of the certificate-pair (which you know in advance) and for c) the "key" will be the IP-address Integrity: E.g. on transport-layer that data has not been changed on the way from when the client (the authenticated party) sent it and until the server receives it Confidentiality: E.g. on transport-layer that data has not been read (and understood) on the way from when the client sent it and until the server receives it Dealing with the aspects on storage-, application-, transport-, OS-, etc-levels Different kinds of technology claim to be able to guarantee different sets of those aspects. E.g. a webcontainer is required to be able to deal with authentication and authorization on application-layer and integrity and confidentiality on transport-layer - at least if it wants to be certified, because it is part of the spec that it needs to implement. The fact that a webcontainer is able to deal with those things for you is something people would like to advantage of - whether or not you guys are willing to accept it or not. At my company we want to do it, and as I understand it, Jan knows about at least two other others that also want to do it. I would claim that at least 90% of Solr users that want "to do security" would like to take advantage of the fact that a webcontainer can handle those things for you. Providing security features is not just something you do, like adding any other feature. You need to have people with a real security background who know what the fuck they are doing to ensure correctness. You need to deal with the inevitable security vulnerabilities and fixes to those. I don't think this is something our PMC should waste its time with. I agree. But this is NOT about Solr going into "security" in the way that "we handle/guarantee this and that kind of security aspect for you". That is still left to other technologies like e.g. the webcontainer. This issue is all about enabling a SolrCloud cluster to work, IF you (a Solr user) choose to have another technology enforce certain security aspects for you. If a Solr users sets up any kind of security technology that require ingoing traffic to a Solr-node to be authenticated (by http basic auth) an this is also enforced for Solr-node to Solr-node traffic the SolrCloud cluster will not work, and you cannot (easily) make it work in a secure way without Solr changes. If you use the webcontainer (running Solr) to enforce those security aspects it will be enforced for Solr-node to Solr-node traffic also. So when issues come up about security-related things in lucene/solr that can be done elsewhere outside of it in a more secure way instead (e.g. encrypting directories and so on), you can expect me to push back too. You might be able to do it "elsewhere" but it will certainly not be "more secure". Encrypting directories is something completely different. It deals with confidentiality aspects on storage-level. This issue has absolutely nothings to do with that. we could recommend ipsec instead for example IPsec is mainly about integrity and confidentiality on transport-level (IP-layer). That is a valid alternative to letting the webcontainer deal with integrity and confidentiality on transport-level (basically require HTTPs transport). Using IPsec for authentication and authorization is very complicated, and unless you really want to do major work, it is only able to deal with those aspects based on certificates. People want to use usernames and passwords in a lot of use-cases. You do not see facebook or twitter or ... wanting you to generate you own RSA-certificate-pairs and send them the public part of it. I know Solr has the philosophy that is it not supposed to be exposed directly - instead be exposed indirectly though some kind of gateway (where authentication and authorization wrt "outer users" can be enforced). But if you are fairly paranoid you do not necessarily want to trust those gateways (they might do bad things both intentionally and unintentionally), and therefore you will also likely want to set up security around you SolrCloud cluster itself. Activating the webcontainers (the ones running Solr) ability to do it for you is just an obvious way. That's why I don't understand how someone would use "forwarding credentials" feature if Solr does not provide any way (best practices, recipes, whatever) to enforce authz policies / security. How do you do that in your application? How do you specify who can do what? Where do you enforce that - in custom UpdateProcessor, SearchComponent, SolrDispathFilter? We use the webcontainers ability to enforce those aspects of security. For recipes I have added a lot to http://wiki.apache.org/solr/SolrSecurity - go read. To spell it out we do the following Add to Solr web.xml AT THE VERY TOP <filter> <filter-name>RegExpAuthorizationFilter</filter-name> <filter-class>org.apache.solr.servlet.security.RegExpAuthorizationFilter</filter-class> <init-param> <param-name>search-constraint</param-name> <param-value>1|search-role,admin-role|^.*/select$</param-value> </init-param> <init-param> <param-name>terms-constraint</param-name> <param-value>2|search-role,admin-role|^.*/terms$</param-value> </init-param> <init-param> <param-name>get-constraint</param-name> <param-value>3|search-role,admin-role|^.*/get$</param-value> </init-param> <init-param> <param-name>admin-constraint</param-name> <param-value>4|admin-role|^.*$</param-value> </init-param> </filter> Add to Solr web.xml (at the spot where it belongs) <security-constraint> <web-resource-collection> <web-resource-name>All resources need authentication</web-resource-name> <url-pattern>/*</url-pattern> </web-resource-collection> <auth-constraint> <role-name>search-role</role-name> <role-name>admin-role</role-name> </auth-constraint> </security-constraint> <login-config> <auth-method>BASIC</auth-method> <realm-name>My Realm</realm-name> </login-config> Add to jetty.xml <Call name= "addBean" > <Arg> <New class= "xxx.yyy.zzz.MyZKLoginService" > <Set name= "name" >My Realm</Set> </New> </Arg> </Call> This basically asks jetty to handle authentication and authorization for you - AT application-layer. See details on http://wiki.apache.org/solr/SolrSecurity about how it works and why it is done the way it is Actually we only let the webcontainer deal with the authentication part. We want to do authorization based on URL-patterns, which a webcontianer is able to do. But due to limitations on the <url-pattern> and the way Solr-URLs are structured and our requirements to URL-based authorization (basically we want a "search-user" allowed to do searches only and an "admin-user" allowed to do anything), we need to deal with authorization in another way. We deal with URL-based authorization by adding the RegExpAuthorizationFilter filter in web.xml. It does URL-pattern authentication, just as the webcontainer itself is able to do for you, but this solution allowed reg-exp URL-patterns, enabling us to enforce the rules we want. We use our own Realm, but you can use one of those that come out of the box with jetty - see http://wiki.apache.org/solr/SolrSecurity . Our realm is using data that we put in ZooKeeper. ZooKeeper has some properties that makes it a nice persistence layer for a realm. It is distributed (unlike local files) so it is easy to make sure that all Solr-nodes at any time authenticate and authorize with the same set of credentials and roles. It also has this nice push-thing (watchers) enabling us to do changes in the realm-data-foundation (in ZK) and have all realms (living in the webcontainers) be aware of the changes without having to go pull for changes all the time. supporting basic auth and https primarily at the container level is much less contraversial Agree. That is exactly why you want to enable a SolrCloud cluster to still work, if the Solr admin chooses to let the container enforce that kind of security SolrCloud is still at an early enough phase that I'm not really willing to spend a lot of time considering security as I add new features or refactor older code. Nor do I want to be on the line when some big company has a security breach due to my code changes. You will not have to deal with the big company. If the enforcement of security does not work, it is because the technology they use to enforce it does not work. Solr is not enforcing security - the webcontainer or something else is. This patch only introduces the ability in SolrCloud, that you can make it work if the Solr admin choose to let the container handle security for you. can you setup some basic auth at the container level No basically not. Not before this patch. ...and run most things over https? No, but that is another issue with Solr - or at least it was the last time I checked it. IPsec is a valid alternative, though. I think ssl stuff should be working after recent http client upgrade and switch to SystemDefaultHttpClient. Now I believe you can set up your key and trust stores using standard Java properties and it should work. Well you are kind of right, even though you mix up concepts a little. SSL (or HTTPS = HTTP over SSL) is about the transport-layer - nothing to do with this issue, SystemDefaultHttpClient or key- and trust-stores. But SSL uses certificate-pairs to do the encryption over the transport-layer. Those same certificate-pairs can be used for authentication, but it is another aspect and has nothing to do with SSL. But there is one big difference in how easy it is to use certificates for encrypted transport vs using it for authentication. To use it for authentication you need to pre-exchange the public parts of you certificates. To use it for encryption on the transport-layer you do not have to pre-exchange. SystemDefaultHttpClient enables us to do certificated based authentication, yes. But it requires setting up key- and trust-stores and pre-exchanging certificates. People want to use username/password based authentication in a lot of user-cases. Correct me if I'm wrong, but all handling of security on inbound requests to Solr is still handled fully by the container, even with this patch. I.e. no code that you add to SolrCloud will be able to open a hole for accepting incoming search requests that should not have been accepted. The user configures the realms, user/pass etc fully on the container level. Correct! With one exception, and that is the -DinternalAuthCredentialsBasicAuthPassword=<password> passed to Solr code, enabling system-initiated inter-node communication. If this is snapped up by foreigners, they potentially gain full access to Solr if they have physical network access. We should find a better way than passing this on the command-line Agree with you concern. The VM-param way of handing over the passwords to Solr is the easiest way, though. I wanted to limit the patch, so that is what is only directly supported for now. But the solution actually does enable you to do it a different way. You can override the CredentialsProviders or you can choose to use the default one (finding credentials in VM params) but not set the VM params using command-line. We do the later in my company - basically we pipe credentials in though stdin, have a small bean reading from stdin when the container starts and add whatever it reads as VM params. Voila, passwords not to be found in environment or on command-line (exposed by e.g. "ps -ef" or "ps eww").
          Hide
          Jan Høydahl added a comment -

          You can override the CredentialsProviders or you can choose to use the default one (finding credentials in VM params) but not set the VM params using command-line.

          Could make an implementation which reads a system-basic-auth.properties from ZK (given your ZK is encrypted). That would be useful once the other issue about securing ZK is solved.

          Or one which takes in a -DinternalAuthCredentialsBasicAuthFile=/path/to/file and then that file would not be readable only to the user running Jetty/Tomcat.

          I agree that since this is pluggable, it is not a showstopper - users can choose how paranoid they are and plug whatever suits them.

          Q: Would it make sense to make CurrentInternalRequestFactory or InterSolrNodeAuthCredentialsFactory pluggable through solr.xml? Currently you need to patch and build Solr to change it, right?

          Show
          Jan Høydahl added a comment - You can override the CredentialsProviders or you can choose to use the default one (finding credentials in VM params) but not set the VM params using command-line. Could make an implementation which reads a system-basic-auth.properties from ZK (given your ZK is encrypted). That would be useful once the other issue about securing ZK is solved. Or one which takes in a -DinternalAuthCredentialsBasicAuthFile=/path/to/file and then that file would not be readable only to the user running Jetty/Tomcat. I agree that since this is pluggable, it is not a showstopper - users can choose how paranoid they are and plug whatever suits them. Q: Would it make sense to make CurrentInternalRequestFactory or InterSolrNodeAuthCredentialsFactory pluggable through solr.xml? Currently you need to patch and build Solr to change it, right?
          Hide
          Mark Miller added a comment -

          Solr is not enforcing security - the webcontainer or something else is.

          A comment right in the patch explains this is not the case - it says the web container authorizes any role and then new Solr code is responsible for dealing with role authorization. This is Solr code that can introduce security bugs. This is the slippery slope, this is the fuzzy line, this is the creep.

          I'm fine with minor extensions to allow the web container security to work in many cases. I'm not currently okay with security code in Solr.

          Show
          Mark Miller added a comment - Solr is not enforcing security - the webcontainer or something else is. A comment right in the patch explains this is not the case - it says the web container authorizes any role and then new Solr code is responsible for dealing with role authorization. This is Solr code that can introduce security bugs. This is the slippery slope, this is the fuzzy line, this is the creep. I'm fine with minor extensions to allow the web container security to work in many cases. I'm not currently okay with security code in Solr.
          Hide
          Per Steffensen added a comment -

          Will respond to you comments and questions monday (its late i europe now), but in the mean-time, can I ask you, Mark, to point out the exact comment you are referring to, so that I am sure I will address your exact concern. Doing a quick search I was not able to locate the place you are referring to. Thanks.

          Show
          Per Steffensen added a comment - Will respond to you comments and questions monday (its late i europe now), but in the mean-time, can I ask you, Mark, to point out the exact comment you are referring to, so that I am sure I will address your exact concern. Doing a quick search I was not able to locate the place you are referring to. Thanks.
          Hide
          Mark Miller added a comment -

          Personally I do not understand why a serious project would stay out of security

          It's simply the current stance of the project. Like I said, if a majority of Solr devs disagree with me and want to move down this road, I'm not very likely to stand in their way. But I want to see the dev/pmc support for a change in a very long standing policy to stay out of security code.

          Show
          Mark Miller added a comment - Personally I do not understand why a serious project would stay out of security It's simply the current stance of the project. Like I said, if a majority of Solr devs disagree with me and want to move down this road, I'm not very likely to stand in their way. But I want to see the dev/pmc support for a change in a very long standing policy to stay out of security code.
          Hide
          Per Steffensen added a comment - - edited

          Could make an implementation which reads a system-basic-auth.properties from ZK (given your ZK is encrypted). That would be useful once the other issue about securing ZK is solved.

          Guess you are talking about SOLR-4580. It is not about encryption (neither storage- nor transport-level). It is about authentication and authorization at application/API-level. But you are right, it is an option to build on top of this issue and allow for those internal credentials to live in ZK - just make sure the security issues doing that is dealt with.

          Q: Would it make sense to make CurrentInternalRequestFactory or InterSolrNodeAuthCredentialsFactory pluggable through solr.xml? Currently you need to patch and build Solr to change it, right?

          Yes you need to patch and rebuild. But that is because it did not include as much code in the patch as I wanted to, and as I did for SOLR-4580. In the patch for SOLR-4580 I've included code so that you, through JVM params, can specify the name of a class which will be used as credentials/ACL-provider. The same should have been done in the patch for this SOLR-4470. I ought to have included code so that you, through JVM params, can point out the classes you want to be used as credentials-providers. Basically JVM params that is able to control which implementations of InterSolrNodeAuthCredentialsFactory.SubRequestFactory and InterSolrNodeAuthCredentialsFactory.InternalRequestFactory is to be used default for InterSolrNodeAuthCredentialsFactory.CurrentSubRequestFactory and InterSolrNodeAuthCredentialsFactory.CurrentInternalRequestFactory.

          JVM params is the simplest way to control which implementations to be used behind the interfaces. That is, in my opinion, what should have been included here. Going from control through JVM params and adding support for control through solr.xml or something else should be another issue, but it is certainly a good and valid idea.

          Solr is not enforcing security - the webcontainer or something else is.

          A comment right in the patch explains this is not the case - it says the web container authorizes any role and then new Solr code is responsible for dealing with role authorization. This is Solr code that can introduce security bugs. This is the slippery slope, this is the fuzzy line, this is the creep.

          Well you did not point out the exact comment as I asked you to, so I will have to guess a little. The code going into the real non-test part of Solr does not do anything to enforce security. It only enables a Solr admin to configure Solr-nodes in a way so that their inter-communication will still work if the Solr admin chooses to set up e.g. container-managed security.

          I order to claim that my solution solves the problem I want to test that it does. Test strategy: Set up container-managed security and verify that all inter-Solr-node communitation works if you use my solution. So the test-code sets up container-managed security, and in there there is a comment about just letting the container manage authentication and handle authorization in a filter. But this is all a simulation of what the Solr admin decided to do to set up security. This is test only!

          Personally I do not understand why a serious project would stay out of security

          It's simply the current stance of the project

          Well I havnt been on the meetings or whatever where this stance was established, but I would imagine that this stance is about Solr not going down the path of enforcing or controlling or ... security. I couldnt imagine that this stance is about that we will not want a SolrClould cluster to work if an Solr admin chooses to activate third party security in a very common way.

          Show
          Per Steffensen added a comment - - edited Could make an implementation which reads a system-basic-auth.properties from ZK (given your ZK is encrypted). That would be useful once the other issue about securing ZK is solved. Guess you are talking about SOLR-4580 . It is not about encryption (neither storage- nor transport-level). It is about authentication and authorization at application/API-level. But you are right, it is an option to build on top of this issue and allow for those internal credentials to live in ZK - just make sure the security issues doing that is dealt with. Q: Would it make sense to make CurrentInternalRequestFactory or InterSolrNodeAuthCredentialsFactory pluggable through solr.xml? Currently you need to patch and build Solr to change it, right? Yes you need to patch and rebuild. But that is because it did not include as much code in the patch as I wanted to, and as I did for SOLR-4580 . In the patch for SOLR-4580 I've included code so that you, through JVM params, can specify the name of a class which will be used as credentials/ACL-provider. The same should have been done in the patch for this SOLR-4470 . I ought to have included code so that you, through JVM params, can point out the classes you want to be used as credentials-providers. Basically JVM params that is able to control which implementations of InterSolrNodeAuthCredentialsFactory.SubRequestFactory and InterSolrNodeAuthCredentialsFactory.InternalRequestFactory is to be used default for InterSolrNodeAuthCredentialsFactory.CurrentSubRequestFactory and InterSolrNodeAuthCredentialsFactory.CurrentInternalRequestFactory. JVM params is the simplest way to control which implementations to be used behind the interfaces. That is, in my opinion, what should have been included here. Going from control through JVM params and adding support for control through solr.xml or something else should be another issue, but it is certainly a good and valid idea. Solr is not enforcing security - the webcontainer or something else is. A comment right in the patch explains this is not the case - it says the web container authorizes any role and then new Solr code is responsible for dealing with role authorization. This is Solr code that can introduce security bugs. This is the slippery slope, this is the fuzzy line, this is the creep. Well you did not point out the exact comment as I asked you to, so I will have to guess a little. The code going into the real non-test part of Solr does not do anything to enforce security. It only enables a Solr admin to configure Solr-nodes in a way so that their inter-communication will still work if the Solr admin chooses to set up e.g. container-managed security. I order to claim that my solution solves the problem I want to test that it does. Test strategy: Set up container-managed security and verify that all inter-Solr-node communitation works if you use my solution. So the test-code sets up container-managed security, and in there there is a comment about just letting the container manage authentication and handle authorization in a filter. But this is all a simulation of what the Solr admin decided to do to set up security. This is test only! Personally I do not understand why a serious project would stay out of security It's simply the current stance of the project Well I havnt been on the meetings or whatever where this stance was established, but I would imagine that this stance is about Solr not going down the path of enforcing or controlling or ... security. I couldnt imagine that this stance is about that we will not want a SolrClould cluster to work if an Solr admin chooses to activate third party security in a very common way.
          Hide
          Jan Høydahl added a comment - - edited

          JVM params is the simplest way to control which implementations to be used behind the interfaces. That is, in my opinion, what should have been included here. Going from control through JVM params and adding support for control through solr.xml or something else should be another issue, but it is certainly a good and valid idea.

          The way we normally set up solr.xml configs is with <mytag>${my.jvm.param}</mytag> style, so admin can choose whether to pass the option as JVM param or include it in solr.xml directly. Something like

          <solr>
            ...
            <subRequestFactory class="${solr.subRequestFactory}" />
            <internalRequestFactory class="${solr.internalRequestFactory}" />
          </solr>
          

          Regarding Mark Miller's concern with authorization "creep", I to some extent agree. But since, as you say, this is test-code only, let's move the class RegExpAuthorizationFilter from runtime codebase and into the test framework. In that way, it is clear that it is only used for realistic test coverage. And if anyone wishes to setup a similar setup in their production they may borrow code from the test class, but it will be a manual step reinforcing that this is not a supported feature of the project as such.

          Show
          Jan Høydahl added a comment - - edited JVM params is the simplest way to control which implementations to be used behind the interfaces. That is, in my opinion, what should have been included here. Going from control through JVM params and adding support for control through solr.xml or something else should be another issue, but it is certainly a good and valid idea. The way we normally set up solr.xml configs is with <mytag>${my.jvm.param}</mytag> style, so admin can choose whether to pass the option as JVM param or include it in solr.xml directly. Something like <solr> ... <subRequestFactory class= "${solr.subRequestFactory}" /> <internalRequestFactory class= "${solr.internalRequestFactory}" /> </solr> Regarding Mark Miller 's concern with authorization "creep", I to some extent agree. But since, as you say, this is test-code only, let's move the class RegExpAuthorizationFilter from runtime codebase and into the test framework. In that way, it is clear that it is only used for realistic test coverage. And if anyone wishes to setup a similar setup in their production they may borrow code from the test class, but it will be a manual step reinforcing that this is not a supported feature of the project as such.
          Hide
          Jan Høydahl added a comment - - edited

          I tried to move RegExpAuthorizationFilter to test scope, but there is a compile-time dependency in JettySolrRunner method lifeCycleStarted(). Can we refactor this piece of code into test-scope as well, e.g. by exposing some a Filter setter on JettySolrRunner?

          Show
          Jan Høydahl added a comment - - edited I tried to move RegExpAuthorizationFilter to test scope, but there is a compile-time dependency in JettySolrRunner method lifeCycleStarted(). Can we refactor this piece of code into test-scope as well, e.g. by exposing some a Filter setter on JettySolrRunner?
          Hide
          Per Steffensen added a comment -

          Regarding Mark Miller's concern with authorization "creep", I to some extent agree. But since, as you say, this is test-code only, let's move the class RegExpAuthorizationFilter from runtime codebase and into the test framework. In that way, it is clear that it is only used for realistic test coverage. And if anyone wishes to setup a similar setup in their production they may borrow code from the test class, but it will be a manual step reinforcing that this is not a supported feature of the project as such.

          RegExpAuthorizationFilter could be in test-code because it is only used for test. As I described above (somewhere) it is just does something I believe a lot of people might want to do - simply because Solr-URLs are kinda upside-down. Therefore I put it in the non-test part of the code, so that people could use it if they found it useful. And described a little about it on http://wiki.apache.org/solr/SolrSecurity#Security_in_Solr_on_per-operation_basis.

          If you do not get the point that this is something you can use if you decide to use it and that it is really not a Solr-thing, then I agree it could be considered creeping into dealing with security enforcement in Solr. You are welcome to move it to test-code, but then we should change the descriptions on http://wiki.apache.org/solr/SolrSecurity#Security_in_Solr_on_per-operation_basis. Either remove descriptions about RegExpAuthorizationFilter or include the code from it directly on the Wiki-page for inspiration or make a pointer to test-code and tell that you can steel it there.

          I tried to move RegExpAuthorizationFilter to test scope, but there is a compile-time dependency in JettySolrRunner method lifeCycleStarted(). Can we refactor this piece of code into test-scope as well, e.g. by exposing some a Filter setter on JettySolrRunner?

          Isnt JettySolrRunner only used for test? Why is it not in test-scope itself? Maybe in test-framework? We can make a funny refactor and expose a filter-setter but it seems like a strange thing to do to let JettySolrRunner, which is only used for test, be able to use some test-stuff. What did I miss?

          Show
          Per Steffensen added a comment - Regarding Mark Miller's concern with authorization "creep", I to some extent agree. But since, as you say, this is test-code only, let's move the class RegExpAuthorizationFilter from runtime codebase and into the test framework. In that way, it is clear that it is only used for realistic test coverage. And if anyone wishes to setup a similar setup in their production they may borrow code from the test class, but it will be a manual step reinforcing that this is not a supported feature of the project as such. RegExpAuthorizationFilter could be in test-code because it is only used for test. As I described above (somewhere) it is just does something I believe a lot of people might want to do - simply because Solr-URLs are kinda upside-down. Therefore I put it in the non-test part of the code, so that people could use it if they found it useful. And described a little about it on http://wiki.apache.org/solr/SolrSecurity#Security_in_Solr_on_per-operation_basis . If you do not get the point that this is something you can use if you decide to use it and that it is really not a Solr-thing, then I agree it could be considered creeping into dealing with security enforcement in Solr. You are welcome to move it to test-code, but then we should change the descriptions on http://wiki.apache.org/solr/SolrSecurity#Security_in_Solr_on_per-operation_basis . Either remove descriptions about RegExpAuthorizationFilter or include the code from it directly on the Wiki-page for inspiration or make a pointer to test-code and tell that you can steel it there. I tried to move RegExpAuthorizationFilter to test scope, but there is a compile-time dependency in JettySolrRunner method lifeCycleStarted(). Can we refactor this piece of code into test-scope as well, e.g. by exposing some a Filter setter on JettySolrRunner? Isnt JettySolrRunner only used for test? Why is it not in test-scope itself? Maybe in test-framework? We can make a funny refactor and expose a filter-setter but it seems like a strange thing to do to let JettySolrRunner, which is only used for test, be able to use some test-stuff. What did I miss?
          Hide
          Sindre Fiskaa added a comment -

          Hi
          Started using Solr for a half year ago. Solr has given our application new functionality that we could only dream of before so this has been very good. The problem is that our customers demand high uptime. SolrCloud fit perfectly and provide redundancy and failover. But our customers also have high security requirements. We have sensitive information and we protect our our Solr nodes with minimum basic authentication + SSL. We rely on this patch for us to take Solr out to our customers, so hope this patch gets released.

          Show
          Sindre Fiskaa added a comment - Hi Started using Solr for a half year ago. Solr has given our application new functionality that we could only dream of before so this has been very good. The problem is that our customers demand high uptime. SolrCloud fit perfectly and provide redundancy and failover. But our customers also have high security requirements. We have sensitive information and we protect our our Solr nodes with minimum basic authentication + SSL. We rely on this patch for us to take Solr out to our customers, so hope this patch gets released.
          Hide
          Ryan McKinley added a comment -

          just skimming the patch... it looks like it adds:

          METHOD.GET, SEARCH_CREDENTIALS
          

          to every request.

          What about using a SolrServer implementation that adds whatever security it needs?

          HttpSolrServer has "invariateParams" with a comment:

            /**
             * Default value: null / empty.
             * <p/>
             * Parameters that are added to every request regardless. This may be a place
             * to add something like an authentication token.
             */
            protected ModifiableSolrParams invariantParams;
          

          this may offer a client only option

          Show
          Ryan McKinley added a comment - just skimming the patch... it looks like it adds: METHOD.GET, SEARCH_CREDENTIALS to every request. What about using a SolrServer implementation that adds whatever security it needs? HttpSolrServer has "invariateParams" with a comment: /** * Default value: null / empty. * <p/> * Parameters that are added to every request regardless. This may be a place * to add something like an authentication token. */ protected ModifiableSolrParams invariantParams; this may offer a client only option
          Hide
          Jan Høydahl added a comment -

          I am currently porting the patch to trunk. There are several new APIs added which needs instrumentation.

          At the same time, I am also moving RegExpAuthorizationFilter into test-framework and adding plugin support in solr.xml for plugging in your own internalRequestFactory and subRequestFactory.

          Will upload a new patch once ready, and probably also commit to the "security" branch. Next, will explore Ryan McKinley's proposal for enforcing invariant params through TestHttpSolrServer.

          Show
          Jan Høydahl added a comment - I am currently porting the patch to trunk. There are several new APIs added which needs instrumentation. At the same time, I am also moving RegExpAuthorizationFilter into test-framework and adding plugin support in solr.xml for plugging in your own internalRequestFactory and subRequestFactory. Will upload a new patch once ready, and probably also commit to the "security" branch. Next, will explore Ryan McKinley 's proposal for enforcing invariant params through TestHttpSolrServer.
          Hide
          Mark Miller added a comment -

          And if anyone wishes to setup a similar setup in their production they may borrow code from the test class, but it will be a manual step reinforcing that this is not a supported feature of the project as such.

          I would be much more okay with this - in this way, we are not responsible for the security this code provides - it's not shipping production solr code, its code a user can plug in as a filter himself and be responsible for himself.

          Show
          Mark Miller added a comment - And if anyone wishes to setup a similar setup in their production they may borrow code from the test class, but it will be a manual step reinforcing that this is not a supported feature of the project as such. I would be much more okay with this - in this way, we are not responsible for the security this code provides - it's not shipping production solr code, its code a user can plug in as a filter himself and be responsible for himself.
          Hide
          Per Steffensen added a comment -

          just skimming the patch... it looks like it adds:

          METHOD.GET, SEARCH_CREDENTIALS

          to every request.

          Seems like you have been skimming test-code. METHOD.GET is not something you add - its where you decide which http-method to use. That was implicit before, but explicit now, simple because I wanted to introduce a credetials-parameter to many of the request-helper-methods, and choose to add it only as the last parameter to the existing variant that had the highest number of params. It results in a lot of repetitions of the same parameters for those request-helper-methods in test-code, but the alternative to me was to make major refactoring/cleanup in this area and it really isnt in scope. It ought to be cleaned up a little, though. But is always did - both before and after this patch.

          ... and adding plugin support in solr.xml for plugging in your own internalRequestFactory and subRequestFactory.

          Nice!

          But what happened to small steps? I deliberately did not do things like that to keep first step as small as possible

          Next, will explore Ryan McKinley's proposal for enforcing invariant params through TestHttpSolrServer

          Credentials are not (Solr-)params, and if you want "forward from super-request" they are not even "invariant". Besides that you need to be able to control which credentials to use from outside the Solr-node (in a kind of config) - and you probably also want to be able to control the "strategy".

          ... its code a user can plug in as a filter himself and be responsible for himself

          It was that all along. Now we just make it a little harder for him to "borrow" the filter. But Ok for me - it is only a very peripheral thing, and not important to the essence of the patch or this issue.

          Show
          Per Steffensen added a comment - just skimming the patch... it looks like it adds: METHOD.GET, SEARCH_CREDENTIALS to every request. Seems like you have been skimming test-code. METHOD.GET is not something you add - its where you decide which http-method to use. That was implicit before, but explicit now, simple because I wanted to introduce a credetials-parameter to many of the request-helper-methods, and choose to add it only as the last parameter to the existing variant that had the highest number of params. It results in a lot of repetitions of the same parameters for those request-helper-methods in test-code, but the alternative to me was to make major refactoring/cleanup in this area and it really isnt in scope. It ought to be cleaned up a little, though. But is always did - both before and after this patch. ... and adding plugin support in solr.xml for plugging in your own internalRequestFactory and subRequestFactory. Nice! But what happened to small steps? I deliberately did not do things like that to keep first step as small as possible Next, will explore Ryan McKinley's proposal for enforcing invariant params through TestHttpSolrServer Credentials are not (Solr-)params, and if you want "forward from super-request" they are not even "invariant". Besides that you need to be able to control which credentials to use from outside the Solr-node (in a kind of config) - and you probably also want to be able to control the "strategy". ... its code a user can plug in as a filter himself and be responsible for himself It was that all along. Now we just make it a little harder for him to "borrow" the filter. But Ok for me - it is only a very peripheral thing, and not important to the essence of the patch or this issue.
          Hide
          Mark Miller added a comment -

          If it was not in test code, it was not that all along at all. That's just silly. It was custom security code in Solr shipping code, and a major problem with this patch. Like I said above - if the security is all handled by the container, this issue is much less controversial.

          Show
          Mark Miller added a comment - If it was not in test code, it was not that all along at all. That's just silly. It was custom security code in Solr shipping code, and a major problem with this patch. Like I said above - if the security is all handled by the container, this issue is much less controversial.
          Hide
          Jan Høydahl added a comment -

          But what happened to small steps?

          Yea, got me there Trunk is quite different. Will put something up once tests are green again.

          Show
          Jan Høydahl added a comment - But what happened to small steps? Yea, got me there Trunk is quite different. Will put something up once tests are green again.
          Hide
          Per Steffensen added a comment -

          if the security is all handled by the container, this issue is much less controversial

          It is and that is the way it always was with this patch

          If you tell me the this issue with RegExpAuthorizationFilter being in non-test code, even though it is only used from test code, always was your (single) issue with this patch, I am going to laugh my ass off. You clearly never looked into it, never understood it and never wanted to understand it - maybe until a few days ago (Im not sure).

          Show
          Per Steffensen added a comment - if the security is all handled by the container, this issue is much less controversial It is and that is the way it always was with this patch If you tell me the this issue with RegExpAuthorizationFilter being in non-test code, even though it is only used from test code, always was your (single) issue with this patch, I am going to laugh my ass off. You clearly never looked into it, never understood it and never wanted to understand it - maybe until a few days ago (Im not sure).
          Hide
          Tim Vaillancourt added a comment -

          I would agree with putting security at the container level if SolrCloud would actually function under that sort of implementation. The reality though is it does not work - inter-node requests fail, several things break.

          That distinction to me DOES make this patch/issue "Solr's problem" to solve, and I don't look at that as a bad thing, but a normal one in software development. I also don't agree that fixing an issue like this commits the Solr Project to being on the hook for security, either. No one expects backend datastores to be on the hook for security, but they still work on it lightly from time to time when it blocks valid use cases for the tech.

          Maybe I'm missing something - is there a way for SolrCloud to 100% work, passing all tests with container-level ONLY security? If there is a way, then I don't need this patch, but I don't think that is the case.

          Show
          Tim Vaillancourt added a comment - I would agree with putting security at the container level if SolrCloud would actually function under that sort of implementation. The reality though is it does not work - inter-node requests fail, several things break. That distinction to me DOES make this patch/issue "Solr's problem" to solve, and I don't look at that as a bad thing, but a normal one in software development. I also don't agree that fixing an issue like this commits the Solr Project to being on the hook for security, either. No one expects backend datastores to be on the hook for security, but they still work on it lightly from time to time when it blocks valid use cases for the tech. Maybe I'm missing something - is there a way for SolrCloud to 100% work, passing all tests with container-level ONLY security? If there is a way, then I don't need this patch, but I don't think that is the case.
          Hide
          Mark Miller added a comment -

          always was your (single) issue with this patch, I am going to laugh my ass off. You clearly never looked into it, never understood it and never wanted to understand it - maybe until a few days ago (Im not sure).

          No - it's not the single issue that I have - but it is the issue that would cause me to -1 the patch. Given that some other committers have expressed some support, it's my attempt to work with them in turning this patch into something I am willing to accept - not neccessarily something I want as part of the code base. That's why I say it makes it less contraversial - it doesn't make me happy with this issue, but it's a path towards me not vetoing this issue.

          IMO, all this security should be handled above Solr - I don't support this patch. That in itself will not stop it from getting in though.

          I also don't agree that fixing an issue like this commits the Solr Project to being on the hook for security, either.

          I disagree with you - once we start adding code to deal with roles or authorization or anything of the kind, that puts us on the hook - thats just how it is and why we don't have code like that.

          The code to faciliate passing internode auth - while not something I'm interested in - is not what puts us on the hook regarding security and Solr code.

          Show
          Mark Miller added a comment - always was your (single) issue with this patch, I am going to laugh my ass off. You clearly never looked into it, never understood it and never wanted to understand it - maybe until a few days ago (Im not sure). No - it's not the single issue that I have - but it is the issue that would cause me to -1 the patch. Given that some other committers have expressed some support, it's my attempt to work with them in turning this patch into something I am willing to accept - not neccessarily something I want as part of the code base. That's why I say it makes it less contraversial - it doesn't make me happy with this issue, but it's a path towards me not vetoing this issue. IMO, all this security should be handled above Solr - I don't support this patch. That in itself will not stop it from getting in though. I also don't agree that fixing an issue like this commits the Solr Project to being on the hook for security, either. I disagree with you - once we start adding code to deal with roles or authorization or anything of the kind, that puts us on the hook - thats just how it is and why we don't have code like that. The code to faciliate passing internode auth - while not something I'm interested in - is not what puts us on the hook regarding security and Solr code.
          Hide
          Per Steffensen added a comment -

          is there a way for SolrCloud to 100% work, passing all tests with container-level ONLY security

          No. At least if you by "container-level security" mean protecting URLs so that you have to authenticate to "get in".

          I also don't agree that fixing an issue like this commits the Solr Project to being on the hook for security, either.

          I disagree with you - once we start adding code to deal with roles or authorization or anything of the kind, that puts us on the hook - thats just how it is and why we don't have code like that.

          But we do NOT "start adding code to deal with roles or authorization or anything of the kind". There is not a single line i the patch (outside test) that deals with roles or authorization. Except for RegExpAuthorizationFilter if you really want to push it, but that is going to test-space now

          The code to faciliate passing internode auth - while not something I'm interested in - is not what puts us on the hook regarding security and Solr code.

          Nice to hear, because that is all this patch is about.

          Show
          Per Steffensen added a comment - is there a way for SolrCloud to 100% work, passing all tests with container-level ONLY security No. At least if you by "container-level security" mean protecting URLs so that you have to authenticate to "get in". I also don't agree that fixing an issue like this commits the Solr Project to being on the hook for security, either. I disagree with you - once we start adding code to deal with roles or authorization or anything of the kind, that puts us on the hook - thats just how it is and why we don't have code like that. But we do NOT "start adding code to deal with roles or authorization or anything of the kind". There is not a single line i the patch (outside test) that deals with roles or authorization. Except for RegExpAuthorizationFilter if you really want to push it, but that is going to test-space now The code to faciliate passing internode auth - while not something I'm interested in - is not what puts us on the hook regarding security and Solr code. Nice to hear, because that is all this patch is about.
          Hide
          Jan Høydahl added a comment -

          Trunk patch which applies to r1499578

          This is more or less a dump of what I currently have. There are nocommit's and some tests commented out.

          Anyway, the patch includes all of the previous mentioned changes, such as

          • RegExpAuthorizationFilter moved to test space
          • Pluggable internalRequestFactory and subRequestFactory in solr.xml
          • New subclass JettySolrRunnerWithSecurity which lives in test-space only for instrumenting Jetty with filters

          I need help with getting the commented-out tests in SecurityDistributedTest pass. Somehow modifying the default internode pass does not take effect on the sub-requests to core-admin on the shards. I have verified that the shards are created by JettySolrRunnerWithSecurity but perhaps the running shards do not pick up the change somehow?

          Happy testing, and feedback welcome

          Show
          Jan Høydahl added a comment - Trunk patch which applies to r1499578 This is more or less a dump of what I currently have. There are nocommit's and some tests commented out. Anyway, the patch includes all of the previous mentioned changes, such as RegExpAuthorizationFilter moved to test space Pluggable internalRequestFactory and subRequestFactory in solr.xml New subclass JettySolrRunnerWithSecurity which lives in test-space only for instrumenting Jetty with filters I need help with getting the commented-out tests in SecurityDistributedTest pass. Somehow modifying the default internode pass does not take effect on the sub-requests to core-admin on the shards. I have verified that the shards are created by JettySolrRunnerWithSecurity but perhaps the running shards do not pick up the change somehow? Happy testing, and feedback welcome
          Hide
          Jan Høydahl added a comment -

          I've also checked in the patch to my GitHub https://github.com/cominvent/lucene-solr/tree/SOLR-4470

          Show
          Jan Høydahl added a comment - I've also checked in the patch to my GitHub https://github.com/cominvent/lucene-solr/tree/SOLR-4470
          Hide
          Mark Miller added a comment -

          Except for RegExpAuthorizationFilter if you really want to push it, but

          that is going to test-space now

          That's the point - I've explained my position pretty clearly above - if
          that did not go to test space, this patch would be vetoed.

          • Mark
          Show
          Mark Miller added a comment - Except for RegExpAuthorizationFilter if you really want to push it, but that is going to test-space now That's the point - I've explained my position pretty clearly above - if that did not go to test space, this patch would be vetoed. – Mark
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Tim Vaillancourt added a comment -

          This thread seems to be going nowhere fast - perhaps too much is trying to be tackled here at once.

          I've created a smaller request JIRA for the setting of ANY arbitrary HTTP header on sub-requests. A user wanting to achieve basic-auth on subrequests (this request comes up all the time) could just set a static "Authorization" header, or any HTTP header for that matter.

          This approach should be a happy medium because it is not "security-related" (not committing Solr to security), and can be used much more generically in several other use cases (Host header, etc).

          https://issues.apache.org/jira/browse/SOLR-5293 ("Allow setting of HTTP headers on SolrCloud subrequests").

          Ideas/thoughts appreciated!

          Show
          Tim Vaillancourt added a comment - This thread seems to be going nowhere fast - perhaps too much is trying to be tackled here at once. I've created a smaller request JIRA for the setting of ANY arbitrary HTTP header on sub-requests. A user wanting to achieve basic-auth on subrequests (this request comes up all the time) could just set a static "Authorization" header, or any HTTP header for that matter. This approach should be a happy medium because it is not "security-related" (not committing Solr to security), and can be used much more generically in several other use cases (Host header, etc). https://issues.apache.org/jira/browse/SOLR-5293 ("Allow setting of HTTP headers on SolrCloud subrequests"). Ideas/thoughts appreciated!
          Hide
          Jan Høydahl added a comment -

          You're right that the patch is huge and would be better broken up in smaller steps. However, the patch SOLR-4470_branch_4x_r1454444.patch is known to work well if you check out that revision, so you may want to test that. The trunk patch and my github branch currently is broken, but the plan is to get it working and let the feature bake in trunk for some while before backporting to 4.x

          Show
          Jan Høydahl added a comment - You're right that the patch is huge and would be better broken up in smaller steps. However, the patch SOLR-4470 _branch_4x_r1454444.patch is known to work well if you check out that revision, so you may want to test that. The trunk patch and my github branch currently is broken, but the plan is to get it working and let the feature bake in trunk for some while before backporting to 4.x
          Hide
          Per Steffensen added a comment -

          We are currently upgrading our version of Solr to be 4.4.0 based instead of 4.0.0 based. Basically merging the changes between 4.0.0 and 4.4.0 into our version of Solr. We have a lot of changes, including my solution of SOLR-4470. After we finish the upgrade we will basically have "information" in out SVN about how a patch on top of 4.4.0 would look. It would be a little hard to extract because I have to filter all of our other changes from the patch, but it can be done.

          Maybe I will be able to convince my boss once again to spend time creating a patch fitting Apache Solr, even though he kinda lost interest in that, because patches (almost) never get into Apache Solr anyway. Will you be able to use a 4.4.0 based patch for anything?

          Regarding working in small steps: It is too hard for me as a non-committer. It will be too hard feeding in small patches, discussing forever to get it in, then the next patch. We create a new full features (fully working and covered by tests), because we want it, and offer it to Apache. It is not that I do not see that argument for small patches, it is just not prioritized at our side.

          Show
          Per Steffensen added a comment - We are currently upgrading our version of Solr to be 4.4.0 based instead of 4.0.0 based. Basically merging the changes between 4.0.0 and 4.4.0 into our version of Solr. We have a lot of changes, including my solution of SOLR-4470 . After we finish the upgrade we will basically have "information" in out SVN about how a patch on top of 4.4.0 would look. It would be a little hard to extract because I have to filter all of our other changes from the patch, but it can be done. Maybe I will be able to convince my boss once again to spend time creating a patch fitting Apache Solr, even though he kinda lost interest in that, because patches (almost) never get into Apache Solr anyway. Will you be able to use a 4.4.0 based patch for anything? Regarding working in small steps: It is too hard for me as a non-committer. It will be too hard feeding in small patches, discussing forever to get it in, then the next patch. We create a new full features (fully working and covered by tests), because we want it, and offer it to Apache. It is not that I do not see that argument for small patches, it is just not prioritized at our side.
          Hide
          Jan Høydahl added a comment -

          I think the only way it's going to get into the code base is to first commit to trunk for baking and then backport. I screwed up somewhere along the way doing too many changes at once. If you'd get green light to spend some time on this, could you perhaps help out with a trunk port? I'm planning to look into this again myself soonish.

          Show
          Jan Høydahl added a comment - I think the only way it's going to get into the code base is to first commit to trunk for baking and then backport. I screwed up somewhere along the way doing too many changes at once. If you'd get green light to spend some time on this, could you perhaps help out with a trunk port? I'm planning to look into this again myself soonish.
          Hide
          Dana Sava added a comment -

          Hello,
          We are currently using SOLR 4.5.1 in our production environment and we tried to setup security on a SOLR cloud configuration. I have read all the 4470 issue activity and it will be very useful for us to be able to download the SOLR-4470_branch_4x_r1452629.patch already compiled from some place, until the 4.7 version is released. Can somebody help me with this issue?
          Thank you,
          Dana

          Show
          Dana Sava added a comment - Hello, We are currently using SOLR 4.5.1 in our production environment and we tried to setup security on a SOLR cloud configuration. I have read all the 4470 issue activity and it will be very useful for us to be able to download the SOLR-4470 _branch_4x_r1452629.patch already compiled from some place, until the 4.7 version is released. Can somebody help me with this issue? Thank you, Dana
          Hide
          Per Steffensen added a comment -

          We are currently using SOLR 4.5.1 in our production environment and we tried to setup security on a SOLR cloud configuration.

          Container managed authentication and authorization I presume?

          I have read all the 4470 issue activity and it will be very useful for us to be able to download the SOLR-4470_branch_4x_r1452629.patch already compiled from some place, until the 4.7 version is released.

          Guess you are looking at "Fix Version/s: 4.7" on this issue, and expect that this means that the fix will be in 4.7. I do not believe it will - unfortunately. So if you want the feature, you need to change the patch yourself to fit the version of Solr you are using, or you can download code for Solr 4.4 plus numerous improvements (including SOLR-4470) here: https://github.com/steff1193/lucene-solr. You will have to build a Solr distribution yourself - and maven artifacts if you need those

          • Building distribution from source
            <checkout>
            cd solr
            ant -Dversion=4.4.0.myversion clean create-package
            
          • Building and deploying artifacts is a little more complicated. Let me know if you need that.

          Please note that https://github.com/steff1193/lucene-solr is only a place where we keep our version of Lucene/Solr, including the changes we have made which has not yet been committed in Apache Solr regi. You are free to use it, but there is no guarantee that there will ever be a version based on a Apache Solr version higher than 4.4. It is very likely that there will be, but no guarantee and you never know when it will happen. Of course it is all open source so if you really want you can fork it yourself.

          Show
          Per Steffensen added a comment - We are currently using SOLR 4.5.1 in our production environment and we tried to setup security on a SOLR cloud configuration. Container managed authentication and authorization I presume? I have read all the 4470 issue activity and it will be very useful for us to be able to download the SOLR-4470 _branch_4x_r1452629.patch already compiled from some place, until the 4.7 version is released. Guess you are looking at "Fix Version/s: 4.7" on this issue, and expect that this means that the fix will be in 4.7. I do not believe it will - unfortunately. So if you want the feature, you need to change the patch yourself to fit the version of Solr you are using, or you can download code for Solr 4.4 plus numerous improvements (including SOLR-4470 ) here: https://github.com/steff1193/lucene-solr . You will have to build a Solr distribution yourself - and maven artifacts if you need those Building distribution from source <checkout> cd solr ant -Dversion=4.4.0.myversion clean create- package Building and deploying artifacts is a little more complicated. Let me know if you need that. Please note that https://github.com/steff1193/lucene-solr is only a place where we keep our version of Lucene/Solr, including the changes we have made which has not yet been committed in Apache Solr regi. You are free to use it, but there is no guarantee that there will ever be a version based on a Apache Solr version higher than 4.4. It is very likely that there will be, but no guarantee and you never know when it will happen. Of course it is all open source so if you really want you can fork it yourself.
          Hide
          David Webster added a comment -

          I have to admit I find the content of this issue to be disturbing coming from such a major Open Source project as Solr. I came here looking for a viable security solution that did not involve segmenting off the system or otherwise using IPsec and other IP-address centric forms of security. For most truly Enterprise worthy solutions the products, themselves simply must address security, internally, to ever be considered truly Enterprise worth solutions. This product does not, and even worse, the core Dev team seems intent on NEVER doing so!

          As the lead Java architect for Distributed Systems Engineering at a fortune 100 company, security is my single most important concern. I don't care how fast a product is, or how many slick features it has, if it isn't secure, at the core, it is worthless as an Enterprise solution (at least for any Enterprise that gives a whit about REAL security). Solr is doomed to use as a lab experiment for any serious Enterprise implementation where security is more than an afterthought.

          I like Solr. I like what it does and how it does it. However, it's lack of internal security hooks is a complete show stopper for use at my firm. So my choices are to internalize the code, using this patch as our starting point, and have our own Solr-like engine, or move on to something like ElasticSearch which actually cares about real security at the node to node level.....

          Also, Mavenize the damned thing! Modern projects still use Ant? I haven't opened a build.xml script in half a decade or more....

          Show
          David Webster added a comment - I have to admit I find the content of this issue to be disturbing coming from such a major Open Source project as Solr. I came here looking for a viable security solution that did not involve segmenting off the system or otherwise using IPsec and other IP-address centric forms of security. For most truly Enterprise worthy solutions the products, themselves simply must address security, internally, to ever be considered truly Enterprise worth solutions. This product does not, and even worse, the core Dev team seems intent on NEVER doing so! As the lead Java architect for Distributed Systems Engineering at a fortune 100 company, security is my single most important concern. I don't care how fast a product is, or how many slick features it has, if it isn't secure, at the core, it is worthless as an Enterprise solution (at least for any Enterprise that gives a whit about REAL security). Solr is doomed to use as a lab experiment for any serious Enterprise implementation where security is more than an afterthought. I like Solr. I like what it does and how it does it. However, it's lack of internal security hooks is a complete show stopper for use at my firm. So my choices are to internalize the code, using this patch as our starting point, and have our own Solr-like engine, or move on to something like ElasticSearch which actually cares about real security at the node to node level..... Also, Mavenize the damned thing! Modern projects still use Ant? I haven't opened a build.xml script in half a decade or more....
          Hide
          Shawn Heisey added a comment -

          This product does not, and even worse, the core Dev team seems intent on NEVER doing so!

          I don't know that we never intend on adding security. We face a major problem with doing so at this time, though: We have absolutely no idea what servlet container the user is going to use for running the solr war. The example includes jetty, but aside from a few small edits in the stock config file, it is unmodified. Solr has no control over the server-side HTTP layer right now, so anything we try to do will almost certainly be wrong as soon as the user changes containers or decides to modify their container config.

          Solr 5.0 will not ship as a .war file. The work hasn't yet been done that will turn it into an actual application, but it will be done before 5.0 gets released. Once Solr is a "real" application that owns and fully controls the HTTP layer, security will not be such a nightmare. You mention ElasticSearch and its ability to deal with security. ES is already a standalone application, which means they can do a lot of things that Solr currently can't. It's a legitimate complaint with Solr, one that we are trying to rectify.

          Also, Mavenize the damned thing! Modern projects still use Ant? I haven't opened a build.xml script in half a decade or more....

          I can't say anything about maven vs. ant. I don't have enough experience with either.

          Show
          Shawn Heisey added a comment - This product does not, and even worse, the core Dev team seems intent on NEVER doing so! I don't know that we never intend on adding security. We face a major problem with doing so at this time, though: We have absolutely no idea what servlet container the user is going to use for running the solr war. The example includes jetty, but aside from a few small edits in the stock config file, it is unmodified. Solr has no control over the server-side HTTP layer right now, so anything we try to do will almost certainly be wrong as soon as the user changes containers or decides to modify their container config. Solr 5.0 will not ship as a .war file. The work hasn't yet been done that will turn it into an actual application, but it will be done before 5.0 gets released. Once Solr is a "real" application that owns and fully controls the HTTP layer, security will not be such a nightmare. You mention ElasticSearch and its ability to deal with security. ES is already a standalone application, which means they can do a lot of things that Solr currently can't. It's a legitimate complaint with Solr, one that we are trying to rectify. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't opened a build.xml script in half a decade or more.... I can't say anything about maven vs. ant. I don't have enough experience with either.
          Hide
          David Webster added a comment -

          Thanks, for the update, Shawn. The move to a stand-alone implementation should be a good one, with hope that a robust security implementation will be at the very top of the priority list. Not sure what the timeline for that is, but I've got a fairly short one for laying down the foundation of our Enterprise Search by 3rd Qtr. That will have to pass IA muster (mainstream Solr does not), which still leaves me in a bit of quandary as to how to proceed. I don't want the added TOC of maintaining our own search engine, but cannot wait around very long for viable solutions to surface, either. I'm either going to have to implement this patch branch, or move on to other engine choices...

          I know JBoss, JBPM specifically, used to be ant based but they've gone full Maven now. This is the first big Open Source project I've run across in some time that still uses Ant. Not many devs on our staff can still read a build.xml file anymore...and those that can would rather not...

          Show
          David Webster added a comment - Thanks, for the update, Shawn. The move to a stand-alone implementation should be a good one, with hope that a robust security implementation will be at the very top of the priority list. Not sure what the timeline for that is, but I've got a fairly short one for laying down the foundation of our Enterprise Search by 3rd Qtr. That will have to pass IA muster (mainstream Solr does not), which still leaves me in a bit of quandary as to how to proceed. I don't want the added TOC of maintaining our own search engine, but cannot wait around very long for viable solutions to surface, either. I'm either going to have to implement this patch branch, or move on to other engine choices... I know JBoss, JBPM specifically, used to be ant based but they've gone full Maven now. This is the first big Open Source project I've run across in some time that still uses Ant. Not many devs on our staff can still read a build.xml file anymore...and those that can would rather not...
          Hide
          Per Steffensen added a comment - - edited

          This product does not, and even worse, the core Dev team seems intent on NEVER doing so!

          At least most of them, yes. It is really a shame.

          As the lead Java architect for Distributed Systems Engineering at a fortune 100 company, security is my single most important concern

          As the tech lead on the largest REAL SolrCloud installation on the planet, I agree I believe I can say that we have the largest installation in the world for two reasons

          • Upgrading from one version of SolrCloud to the next is not something that seem to be very important in this product. At least it is hard to do, and there seem to be no testing of it when a new release 4.y comes out - no testing that you can actually upgrade to it from 4.x. This makes me believe that no-one or at least only a few, have so big installations that just installing 4.y and store/index all data from the old 4.x installation from scratch is not an option. If others actually had to do upgrades where this is not possible, lots of complaints would pop up - and they dont
          • Our biggest system stores and indexes 1-2 billion documents per day, and have 2 years of history. That is about 1000 billion documents in Solr at any time with 1-2 billion going in every day (and 30-60 billion going out every month). To be able to run such a system we needed to do numerous optimizations, and in general without optimizations you will never get such a big system working. I do not see much talk around here about optimizations of that kind - probably because people have not run into the problems yet.

          I like Solr. I like what it does and how it does it.

          Me too. On that part it actually has numerous advantages over e.g. ElasticSearch. We used ES to begin with, and we liked it, but for political reasons we where not allowed to keep using it, and we turned to find an alternative. At that point in time SolrCloud (4.x) where only in its startup phase (a year before 4.0 was released), but we believed so much in the idea behind, that we decided to go for it.

          However, it's lack of internal security hooks is a complete show stopper for use at my firm

          For us, too. That is why we made our own fix to it - provided as a patch here and also available at https://github.com/steff1193/lucene-solr

          Using this patch as our starting point

          I am happy to hear that. Please feel free to contact me if you have any problems making it work or understanding what it does. I might also be able to provide a few tips on making it extra secure

          and have our own Solr-like engine

          We made the same decision years ago. We have had our own version of Solr in our own VCS for years. Just recently I put the code on https://github.com/steff1193/lucene-solr. No releases (incl maven artifacts) yet. But that will come soon. Until then you will have to build it yourself from source.

          Also, Mavenize the damned thing! Modern projects still use Ant? I haven't opened a build.xml script in half a decade or more....

          Already done.

          ant [-Dversion=$VERSION] get-maven-poms
          

          Will build the maven structure in folder "maven-build"
          E.g. if you use Eclipse

          ant eclipse
          

          In Eclipse right-click the root-folder, chose "Import..." and "Existing Maven Project". Import all Maven pom.xmls from maven-build folder

          We have absolutely no idea what servlet container the user is going to use for running the solr war.

          It isnt important for this issue. Protecting the HTTP endpoints with authentication and authorization is standardized in the servlet-spec. All web-containers have to live up to that standard (to be certified). Only place where the standardization is not very clear is how to install a realm (the thingy knowing about user-credentials and roles), but all containers have plenty of documentation on how to do it.

          It is very important to understand that this issue, and the patch I provided will work for any web-container. This issue is not about enforcing the protection - let the web-container do that. This issue and the patch is ONLY about enabling Solr to send credentials in its Solr-node-to-Solr-node requests, so that things will keep working, if/when you make the obvious security decision and make usage of the security-features provided to you for free by the container.

          Solr has no control over the server-side HTTP layer right now, so anything we try to do will almost certainly be wrong as soon as the user changes containers or decides to modify their container config.

          NO!

          Solr 5.0 will not ship as a .war file

          Bad idea. This is one of the points where Solr did a better decision that ES

          Once Solr is a "real" application that owns and fully controls the HTTP layer, security will not be such a nightmare

          Web-containers let you control exactly what you ought to be able to control through configuration.
          Security is not a nightmare today. If you remove the web-container it will become a nightmare.
          In web-containers it is standardized and most people know how it works. The web-container vendors (JBoss, Glassfish, Tomcat, WebLogic, Jetty, Geronimo, WebSphere, Trifork T4, etc etc) have used years building and stabilizing the security-implementation. Thinking that we can do better by ourselves in Solr is just naive.
          Web-container vendors have also been using years and years on developing, stabilizing and optimizing all the other stuff that web-containers give us for free (optimized HTTP end-points, thread-controll/pooling, security etc etc). Believing that Solr can do better on those areas is just naive. Why not let web-containers deal with what they have been made to deal with, and let Solr deal with its core area of business.

          The decision of going away from letting the web-container deal with security is also kinda contradictory with what Mark said earlier "We have kept security related things at the container and user level for a reason in the past - moving from that stance should require a lot of buy in." I agree with Mark on that one, so why go away from letting the web-container deal with security. Web-containers are very good at handling security - why not let them do that job

          Show
          Per Steffensen added a comment - - edited This product does not, and even worse, the core Dev team seems intent on NEVER doing so! At least most of them, yes. It is really a shame. As the lead Java architect for Distributed Systems Engineering at a fortune 100 company, security is my single most important concern As the tech lead on the largest REAL SolrCloud installation on the planet, I agree I believe I can say that we have the largest installation in the world for two reasons Upgrading from one version of SolrCloud to the next is not something that seem to be very important in this product. At least it is hard to do, and there seem to be no testing of it when a new release 4.y comes out - no testing that you can actually upgrade to it from 4.x. This makes me believe that no-one or at least only a few, have so big installations that just installing 4.y and store/index all data from the old 4.x installation from scratch is not an option. If others actually had to do upgrades where this is not possible, lots of complaints would pop up - and they dont Our biggest system stores and indexes 1-2 billion documents per day, and have 2 years of history. That is about 1000 billion documents in Solr at any time with 1-2 billion going in every day (and 30-60 billion going out every month). To be able to run such a system we needed to do numerous optimizations, and in general without optimizations you will never get such a big system working. I do not see much talk around here about optimizations of that kind - probably because people have not run into the problems yet. I like Solr. I like what it does and how it does it. Me too. On that part it actually has numerous advantages over e.g. ElasticSearch. We used ES to begin with, and we liked it, but for political reasons we where not allowed to keep using it, and we turned to find an alternative. At that point in time SolrCloud (4.x) where only in its startup phase (a year before 4.0 was released), but we believed so much in the idea behind, that we decided to go for it. However, it's lack of internal security hooks is a complete show stopper for use at my firm For us, too. That is why we made our own fix to it - provided as a patch here and also available at https://github.com/steff1193/lucene-solr Using this patch as our starting point I am happy to hear that. Please feel free to contact me if you have any problems making it work or understanding what it does. I might also be able to provide a few tips on making it extra secure and have our own Solr-like engine We made the same decision years ago. We have had our own version of Solr in our own VCS for years. Just recently I put the code on https://github.com/steff1193/lucene-solr . No releases (incl maven artifacts) yet. But that will come soon. Until then you will have to build it yourself from source. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't opened a build.xml script in half a decade or more.... Already done. ant [-Dversion=$VERSION] get-maven-poms Will build the maven structure in folder "maven-build" E.g. if you use Eclipse ant eclipse In Eclipse right-click the root-folder, chose "Import..." and "Existing Maven Project". Import all Maven pom.xmls from maven-build folder We have absolutely no idea what servlet container the user is going to use for running the solr war. It isnt important for this issue. Protecting the HTTP endpoints with authentication and authorization is standardized in the servlet-spec. All web-containers have to live up to that standard (to be certified). Only place where the standardization is not very clear is how to install a realm (the thingy knowing about user-credentials and roles), but all containers have plenty of documentation on how to do it. It is very important to understand that this issue, and the patch I provided will work for any web-container. This issue is not about enforcing the protection - let the web-container do that. This issue and the patch is ONLY about enabling Solr to send credentials in its Solr-node-to-Solr-node requests, so that things will keep working, if/when you make the obvious security decision and make usage of the security-features provided to you for free by the container. Solr has no control over the server-side HTTP layer right now, so anything we try to do will almost certainly be wrong as soon as the user changes containers or decides to modify their container config. NO! Solr 5.0 will not ship as a .war file Bad idea. This is one of the points where Solr did a better decision that ES Once Solr is a "real" application that owns and fully controls the HTTP layer, security will not be such a nightmare Web-containers let you control exactly what you ought to be able to control through configuration. Security is not a nightmare today. If you remove the web-container it will become a nightmare. In web-containers it is standardized and most people know how it works. The web-container vendors (JBoss, Glassfish, Tomcat, WebLogic, Jetty, Geronimo, WebSphere, Trifork T4, etc etc) have used years building and stabilizing the security-implementation. Thinking that we can do better by ourselves in Solr is just naive. Web-container vendors have also been using years and years on developing, stabilizing and optimizing all the other stuff that web-containers give us for free (optimized HTTP end-points, thread-controll/pooling, security etc etc). Believing that Solr can do better on those areas is just naive. Why not let web-containers deal with what they have been made to deal with, and let Solr deal with its core area of business. The decision of going away from letting the web-container deal with security is also kinda contradictory with what Mark said earlier "We have kept security related things at the container and user level for a reason in the past - moving from that stance should require a lot of buy in." I agree with Mark on that one, so why go away from letting the web-container deal with security. Web-containers are very good at handling security - why not let them do that job
          Hide
          Mark Miller added a comment -

          The bulk of this patch was not that contentious. The rest seemed to mostly be hashed out. The missing piece has been a committer with the skill and time to put it in, take responsibility for it, and support it.

          Show
          Mark Miller added a comment - The bulk of this patch was not that contentious. The rest seemed to mostly be hashed out. The missing piece has been a committer with the skill and time to put it in, take responsibility for it, and support it.
          Hide
          Jan Høydahl added a comment -

          I started the port to trunk along with some other changes last summer, but did not get to finalize it within the time available at that time. I also realized I need some help moving along as I'm quite novice on servlet security.

          Implementing this patch for 5.0 and 4.x would still be worth the effort, should we choose to replace the container with Netty or something else, since most of the internal inter-node communication will stay the same - is that correct?

          When I dived into this last time around the intent was to commit a working impl to trunk first, let it bake for a few weeks (perhaps with the test framework randomizin security on/off) and then backport. This is best practice for big changes, and this patch is HUGE. So here is one committer willing to contribute, but I need some help from someone willing to take a look at https://github.com/cominvent/lucene-solr/tree/SOLR-4470 and finding out out what 1% is missing for it to work, and then get it up to date with current trunk...

          Show
          Jan Høydahl added a comment - I started the port to trunk along with some other changes last summer, but did not get to finalize it within the time available at that time. I also realized I need some help moving along as I'm quite novice on servlet security. Implementing this patch for 5.0 and 4.x would still be worth the effort, should we choose to replace the container with Netty or something else, since most of the internal inter-node communication will stay the same - is that correct? When I dived into this last time around the intent was to commit a working impl to trunk first, let it bake for a few weeks (perhaps with the test framework randomizin security on/off) and then backport. This is best practice for big changes, and this patch is HUGE. So here is one committer willing to contribute, but I need some help from someone willing to take a look at https://github.com/cominvent/lucene-solr/tree/SOLR-4470 and finding out out what 1% is missing for it to work, and then get it up to date with current trunk...
          Hide
          David Webster added a comment -

          Again, appreciate the input, looks like the issue is at least alive. We are meeting Friday on this issue to plot our strategy. I am getting familiar with the specifics of the issue, and am coming to realize the type of HTTP container is largely irrelevant, so long as it is spec-compliant servlet container (as Tomcat and Jetty are).

          I do not particularly agree with the need for a container, however. We are gradually moving away from pre-packaged "containers" ourselves, instead moving towards framework tools like Spring Web and Grizzly2. We write all our own JAAS LoginModules today and have a deep bench when it comes to managing service side security, be those servlet (RESTful/HTTP), JMS, or anything else. There are pluses and minuses in whether or not to use standard containers or roll your own Servlet implementation. Another discussion for another day....

          We have had the same issue present in Solr in our RESTful service implementations in making them secure. We have a maturing RESTful/HTTP security standard, and that requires our REST client code to do very specific things when making down stream requests to secure services that expect a very specific secured request. For instance, I can add a valve to Tomcat to have it check for a user's SiteMinder cookie and then validate it with a call to a Policy server. I could also implement a "secret key" (kerberos type thing). I can implement that capability on the service side via a JAAS LoginModule, and Tomcat Valve configuration without digging into Tomcat core code. But on the client side I have to write actual core code to place the SiteMinder token/Secret key encryption, etc.. in a cookie or header, etc, and send it downstream.

          I imagine the same must be true in the SolrCloud. I can lock down the receiver side via configuration and standard Container plugins, but it's the sender side that we can do nothing about without some core code modification that would allow us to send whatever security artifacts downstream we deem appropriate. My main fear is performance within the cloud during the sharding processes.

          Show
          David Webster added a comment - Again, appreciate the input, looks like the issue is at least alive. We are meeting Friday on this issue to plot our strategy. I am getting familiar with the specifics of the issue, and am coming to realize the type of HTTP container is largely irrelevant, so long as it is spec-compliant servlet container (as Tomcat and Jetty are). I do not particularly agree with the need for a container, however. We are gradually moving away from pre-packaged "containers" ourselves, instead moving towards framework tools like Spring Web and Grizzly2. We write all our own JAAS LoginModules today and have a deep bench when it comes to managing service side security, be those servlet (RESTful/HTTP), JMS, or anything else. There are pluses and minuses in whether or not to use standard containers or roll your own Servlet implementation. Another discussion for another day.... We have had the same issue present in Solr in our RESTful service implementations in making them secure. We have a maturing RESTful/HTTP security standard, and that requires our REST client code to do very specific things when making down stream requests to secure services that expect a very specific secured request. For instance, I can add a valve to Tomcat to have it check for a user's SiteMinder cookie and then validate it with a call to a Policy server. I could also implement a "secret key" (kerberos type thing). I can implement that capability on the service side via a JAAS LoginModule, and Tomcat Valve configuration without digging into Tomcat core code. But on the client side I have to write actual core code to place the SiteMinder token/Secret key encryption, etc.. in a cookie or header, etc, and send it downstream. I imagine the same must be true in the SolrCloud. I can lock down the receiver side via configuration and standard Container plugins, but it's the sender side that we can do nothing about without some core code modification that would allow us to send whatever security artifacts downstream we deem appropriate. My main fear is performance within the cloud during the sharding processes.
          Hide
          Per Steffensen added a comment -

          ...and this patch is HUGE

          Yes it is, but a lot of it is on the testing part. Hopefully, I will have the time to help out at some point, but my customer is not as willing as he has been, allowing me to use time on getting patches all the way into Apache code-base. He (we) has realized along the way that it is not worth the effort - it is not going to get in anyway. I sincerely hope that he is wrong in the long run.

          Again, appreciate the input, looks like the issue is at least alive.

          At least it wakes up from time to time, when someone new comes around and complains

          We write all our own JAAS LoginModules today

          We do that too. For our SolrCloud based solutions we have made a LoginModule persisting its data in ZooKeeper. It is around anyway and has nice features about pushing changes in password etc to the realms in the web-container.

          I can lock down the receiver side via configuration and standard Container plugins, but it's the sender side that we can do nothing about without some core code modification that would allow us to send whatever security artifacts downstream we deem appropriate

          In order not to make this issue too "wide", I deliberately made it about "basic http auth" only. If you use SolrJ as the client the patch I provided allow for easy injection of the username/password sent to the Server-side by the SolrJ client. Basically the patch is about support in SolrJ clients to fetch (from the outside - config or stdin or whatever) security artifacts to use and then use them in outgoing requests. Since Solr-nodes use SolrJ for sending inter-Solr-node request this will make the feature available for inter-Solr-node requests, but at the same time the feature is also available for "clients" using SolrJ. If you use other kinds of clients you will need to deal with the credentials-sending from clients yourself.
          The patch is also "prepared for" other kinds of security artifacts (cookeis, RSA certificates or whatever), so that they can be built into the established framework, but currently it only supports basic http auth out of the box - but also for SolrJ clients.

          Spring Web and Grizzly2

          With respect to going away from the using all the nice features provided by web-containers and replace by other "frameworks" providing the same features, do not get me wrong. In principle I have nothing against that, when we just do not believe that we can do better ourselves. Use some framework (web-containers or Spring stuff or whatever) to deal with that stuff. And as long as we do that, and do not make it ourselves, I am not sure the argument about "Once Solr is a "real" application that owns and fully controls the HTTP layer, security will not be such a nightmare" holds. Either we do it ourselves and get "full control" or we do it by using some "framework", and claiming the the alternative framework (alternative to web-containers) provide more "control" than web-containers do, is not necessarily true. But do not get me wrong, I have nothing against using other frameworks than web-containers, I am just not convinced that it is worth the effort. Web-containers in general do a great job, and most of them have had years and years to mature.

          David Webster, if you really care about security in SolrCloud, you might also want to take a peek at SOLR-4580. Depending on how much you expose ZooKeeper, how afraid you are of admins etc accidentally messing up ZK data etc, this might also be relevant for you.

          Show
          Per Steffensen added a comment - ...and this patch is HUGE Yes it is, but a lot of it is on the testing part. Hopefully, I will have the time to help out at some point, but my customer is not as willing as he has been, allowing me to use time on getting patches all the way into Apache code-base. He (we) has realized along the way that it is not worth the effort - it is not going to get in anyway. I sincerely hope that he is wrong in the long run. Again, appreciate the input, looks like the issue is at least alive. At least it wakes up from time to time, when someone new comes around and complains We write all our own JAAS LoginModules today We do that too. For our SolrCloud based solutions we have made a LoginModule persisting its data in ZooKeeper. It is around anyway and has nice features about pushing changes in password etc to the realms in the web-container. I can lock down the receiver side via configuration and standard Container plugins, but it's the sender side that we can do nothing about without some core code modification that would allow us to send whatever security artifacts downstream we deem appropriate In order not to make this issue too "wide", I deliberately made it about "basic http auth" only. If you use SolrJ as the client the patch I provided allow for easy injection of the username/password sent to the Server-side by the SolrJ client. Basically the patch is about support in SolrJ clients to fetch (from the outside - config or stdin or whatever) security artifacts to use and then use them in outgoing requests. Since Solr-nodes use SolrJ for sending inter-Solr-node request this will make the feature available for inter-Solr-node requests, but at the same time the feature is also available for "clients" using SolrJ. If you use other kinds of clients you will need to deal with the credentials-sending from clients yourself. The patch is also "prepared for" other kinds of security artifacts (cookeis, RSA certificates or whatever), so that they can be built into the established framework, but currently it only supports basic http auth out of the box - but also for SolrJ clients. Spring Web and Grizzly2 With respect to going away from the using all the nice features provided by web-containers and replace by other "frameworks" providing the same features, do not get me wrong. In principle I have nothing against that, when we just do not believe that we can do better ourselves. Use some framework (web-containers or Spring stuff or whatever) to deal with that stuff. And as long as we do that, and do not make it ourselves, I am not sure the argument about "Once Solr is a "real" application that owns and fully controls the HTTP layer, security will not be such a nightmare" holds. Either we do it ourselves and get "full control" or we do it by using some "framework", and claiming the the alternative framework (alternative to web-containers) provide more "control" than web-containers do, is not necessarily true. But do not get me wrong, I have nothing against using other frameworks than web-containers, I am just not convinced that it is worth the effort. Web-containers in general do a great job, and most of them have had years and years to mature. David Webster , if you really care about security in SolrCloud, you might also want to take a peek at SOLR-4580 . Depending on how much you expose ZooKeeper, how afraid you are of admins etc accidentally messing up ZK data etc, this might also be relevant for you.
          Hide
          Per Steffensen added a comment -

          Jan Høydahl asked me to help making a patch fitting trunk. Find attached SOLR-4470_trunk_r1568857.patch fitting revision 1568857 of trunk (today).

          Changes in this patch from the top of my head

          • Cleaned up so that much less changes are necessary in test-space
            • Introduced manipulateRequest on SolrServers. Currently only used in tests, but guess it is a nice little thing to have in general to make code that gets the chance to modify any request that goes through a certain SolrServer
            • And introduced TestSolrServers.TestXXXSolrServer’s to be used in all tests (where you run with security set up on the Solr-nodes) - they make sure to add correct credentials to outgoing requests from the test-methods themselves. Remember that this SOLR-4470 is not very much about clients being able to penetrate the security-layer of a web-container by providing correct credentials - it is about Solr-node-to-Solr-node requests also being able to do so. (It would be nice to use TestXXXSolrServers in all tests, because it is nice to have such a place to add test-only stuff, but for now introduced for tests running with security added on their jettys)
          • SecurityDistributedTest is now green (Jan Høydahl struggled with that). Had to hook into StreamingSolrServers introducing interface SolrServerFactory and set it from SecurityDistributedTest in order to catch the exceptions to assert on. I believe it is a fail that SolrCmdDistributor does not report errors back to client - see the FIXME in SolrCmdDistributor
          • By code-inspection alone (I have not tested it) I found an error in StreamingSolrServers - see the FIXME in StreamingSolrServers. Because I have not tested it, I am not sure that what I write in the FIXME is true, but I cannot see why it should not be
          • Authentication credentials provider factories (pluggable components to provide credentials for Solr-node-to-Solr-node requests) is now configurable in solr.xml like this (maybe some would like to have it in cluster-properties instead, but that change should be trivial to make later)
              <security>
                <interSolrNodeRequestAuthCredentialsProviderFactories>
                  <directSubRequest>
                    <str name="class">org.apache.solr.security.UseSuperRequestAuthCredentialsSubRequestFactory</str>
                  </directSubRequest>
                  <internalRequest>
                    <str name="class">org.apache.solr.security.SystemPropertiesAuthCredentialsInternalRequestFactory</str>
                  </internalRequest>
                </interSolrNodeRequestAuthCredentialsProviderFactories>
              </security>
            

            Would like to have made it like below instead, but this thing about int, str etc. seem to be the way it is done here (when in Rome… I guess)

              <directSubRequest class=“org.apache.solr.security.UseSuperRequestAuthCredentialsSubRequestFactory”/>
            
          • I have tried running tests to verify that the entire test-suite is green after my patch, but I simply cannot make an entire test-suite-run green, but I cannot do that without the patch either. But I am pretty sure tests do not become “more red” by this patch. Have run a lot of partial tests.
          Show
          Per Steffensen added a comment - Jan Høydahl asked me to help making a patch fitting trunk. Find attached SOLR-4470 _trunk_r1568857.patch fitting revision 1568857 of trunk (today). Changes in this patch from the top of my head Cleaned up so that much less changes are necessary in test-space Introduced manipulateRequest on SolrServers. Currently only used in tests, but guess it is a nice little thing to have in general to make code that gets the chance to modify any request that goes through a certain SolrServer And introduced TestSolrServers.TestXXXSolrServer’s to be used in all tests (where you run with security set up on the Solr-nodes) - they make sure to add correct credentials to outgoing requests from the test-methods themselves. Remember that this SOLR-4470 is not very much about clients being able to penetrate the security-layer of a web-container by providing correct credentials - it is about Solr-node-to-Solr-node requests also being able to do so. (It would be nice to use TestXXXSolrServers in all tests, because it is nice to have such a place to add test-only stuff, but for now introduced for tests running with security added on their jettys) SecurityDistributedTest is now green ( Jan Høydahl struggled with that). Had to hook into StreamingSolrServers introducing interface SolrServerFactory and set it from SecurityDistributedTest in order to catch the exceptions to assert on. I believe it is a fail that SolrCmdDistributor does not report errors back to client - see the FIXME in SolrCmdDistributor By code-inspection alone (I have not tested it) I found an error in StreamingSolrServers - see the FIXME in StreamingSolrServers. Because I have not tested it, I am not sure that what I write in the FIXME is true, but I cannot see why it should not be Authentication credentials provider factories (pluggable components to provide credentials for Solr-node-to-Solr-node requests) is now configurable in solr.xml like this (maybe some would like to have it in cluster-properties instead, but that change should be trivial to make later) <security> <interSolrNodeRequestAuthCredentialsProviderFactories> <directSubRequest> <str name= "class" >org.apache.solr.security.UseSuperRequestAuthCredentialsSubRequestFactory</str> </directSubRequest> <internalRequest> <str name= "class" >org.apache.solr.security.SystemPropertiesAuthCredentialsInternalRequestFactory</str> </internalRequest> </interSolrNodeRequestAuthCredentialsProviderFactories> </security> Would like to have made it like below instead, but this thing about int, str etc. seem to be the way it is done here (when in Rome… I guess) <directSubRequest class=“org.apache.solr.security.UseSuperRequestAuthCredentialsSubRequestFactory”/> I have tried running tests to verify that the entire test-suite is green after my patch, but I simply cannot make an entire test-suite-run green, but I cannot do that without the patch either. But I am pretty sure tests do not become “more red” by this patch. Have run a lot of partial tests.
          Hide
          Jan Høydahl added a comment -

          New trunk patch SOLR-4470.patch fitting on top of r1570113

          • Fixed a test bug
          • Passes ant precommit (Some illegal HTML in javadoc)
          Show
          Jan Høydahl added a comment - New trunk patch SOLR-4470 .patch fitting on top of r1570113 Fixed a test bug Passes ant precommit (Some illegal HTML in javadoc)
          Hide
          Per Steffensen added a comment -

          BTW, updated the descriptions on http://wiki.apache.org/solr/SolrSecurity to reflect latest trunk patch. Jan knew that, but others might be following this ticket

          Show
          Per Steffensen added a comment - BTW, updated the descriptions on http://wiki.apache.org/solr/SolrSecurity to reflect latest trunk patch. Jan knew that, but others might be following this ticket
          Hide
          Per Steffensen added a comment -

          Looking at the latest patch, there seem to be a lot of diffs that is only about replacing one kind of empty line with another kind of empty line. Not sure it belongs here.

          Show
          Per Steffensen added a comment - Looking at the latest patch, there seem to be a lot of diffs that is only about replacing one kind of empty line with another kind of empty line. Not sure it belongs here.
          Hide
          Jan Høydahl added a comment - - edited

          Attaching patch with these changes:

          • No whitespace-only diffs.
          • AuthCredentials.java moved from org.apache.solr.security to below "common": org.apache.solr.common.auth, feels it's a better location, updated package.html description for the new package
          • Javadoc for AuthCredentials class
          • CoreAdminRequest#persist() was undeprecated by previous patch, assuming that was a typo, re-adding deprecation.
          • In previous patch two public static methods in CoreAdminRequest#createCore(...) were replaced with methods adding AuthCredentials. Added these back not to break back-compat for external users.
          • In HttpClientUtil#createClient(...) the code in prev patch was:
                if (Boolean.getBoolean( HTTP_CLIENTS_MUST_ADAPT_TO_CREDENTIALS_CHANGES) && authCredentials != null) {
             

            However, getBoolean on this final string will always be false, changed to what I guess was the intention:

                if (config.getBool( HTTP_CLIENTS_MUST_ADAPT_TO_CREDENTIALS_CHANGES, false) && authCredentials != null) {
             
          • In HttpClientUtil, re-add @Deprecated versions of the old PROP_BASIC_AUTH_USER and PROP_BASIC_AUTH_PASS config params along with these
            two method signatures:
                public static void configureClient(final DefaultHttpClient httpClient, SolrParams config)

            and

                @Deprecated
                public static void setBasicAuth(DefaultHttpClient httpClient, String basicAuthUser, String basicAuthPass)

            and corresponding code in HttpClientConfigurer. This way 3rd party code already using this way will get a warning before they need to recompile in a later version of Solr.
            Also added a bullet on top of CHANGES.txt that people should start using the new way of doing things.

          Btw. This patch (SOLR-4470.patch) is for r1570335

          Show
          Jan Høydahl added a comment - - edited Attaching patch with these changes: No whitespace-only diffs. AuthCredentials.java moved from org.apache.solr.security to below "common": org.apache.solr.common.auth , feels it's a better location, updated package.html description for the new package Javadoc for AuthCredentials class CoreAdminRequest#persist() was undeprecated by previous patch, assuming that was a typo, re-adding deprecation. In previous patch two public static methods in CoreAdminRequest#createCore(...) were replaced with methods adding AuthCredentials. Added these back not to break back-compat for external users. In HttpClientUtil#createClient(...) the code in prev patch was: if ( Boolean .getBoolean( HTTP_CLIENTS_MUST_ADAPT_TO_CREDENTIALS_CHANGES) && authCredentials != null ) { However, getBoolean on this final string will always be false, changed to what I guess was the intention: if (config.getBool( HTTP_CLIENTS_MUST_ADAPT_TO_CREDENTIALS_CHANGES, false ) && authCredentials != null ) { In HttpClientUtil, re-add @Deprecated versions of the old PROP_BASIC_AUTH_USER and PROP_BASIC_AUTH_PASS config params along with these two method signatures: public static void configureClient( final DefaultHttpClient httpClient, SolrParams config) and @Deprecated public static void setBasicAuth(DefaultHttpClient httpClient, String basicAuthUser, String basicAuthPass) and corresponding code in HttpClientConfigurer. This way 3rd party code already using this way will get a warning before they need to recompile in a later version of Solr. Also added a bullet on top of CHANGES.txt that people should start using the new way of doing things. Btw. This patch ( SOLR-4470 .patch) is for r1570335
          Hide
          Jan Høydahl added a comment -

          Attaching new patch fitting on top of current trunk (r1571287).

          Show
          Jan Høydahl added a comment - Attaching new patch fitting on top of current trunk (r1571287).
          Hide
          Jan Høydahl added a comment -

          Inviting all users/committers to give feedback on other things that need changing before commit to trunk.

          Show
          Jan Høydahl added a comment - Inviting all users/committers to give feedback on other things that need changing before commit to trunk.
          Hide
          Per Steffensen added a comment -

          No whitespace-only diffs

          Good

          Javadoc for AuthCredentials class

          Nice

          CoreAdminRequest#persist() was undeprecated by previous patch, assuming that was a typo, re-adding deprecation.

          Not by intent from my side, so the re-adding seem correct

          In HttpClientUtil#createClient(...) the code in prev patch was ...

          Boolean.getBoolean was the intention. And the result will not always be false - not if you set the JVM-param to "true". This is done in SecurityDistributedTest so that we can oversteer credentials used in inter-solr-node requests from the test. Please revert to Boolean.getBoolean.

          Ad) CHANGES.txt

          ... The feature propagates credentials from outer request to the sub-requests ...
          

          I can live with this one, but maybe it should be expressed differently. The thing you describe is only true if you use UseSuperRequestAuthCredentialsSubRequestFactory - you can make other implementations of SubRequestFactory where it is not true. And it is only true for "internal requests that is a direct reaction to a request from the outside". In case of "internal requests that is not triggered by a request from the outside" it can never be true - there are no outer request to propagate credentials from. For such requests, if you use SystemPropertiesAuthCredentialsInternalRequestFactory, you can control basic http credentials through JVM-params, but here you can implement your own solution as well. Just wanted to make it clear. If you want to keep what you wrote, then ok for me.

          Show
          Per Steffensen added a comment - No whitespace-only diffs Good Javadoc for AuthCredentials class Nice CoreAdminRequest#persist() was undeprecated by previous patch, assuming that was a typo, re-adding deprecation. Not by intent from my side, so the re-adding seem correct In HttpClientUtil#createClient(...) the code in prev patch was ... Boolean.getBoolean was the intention. And the result will not always be false - not if you set the JVM-param to "true". This is done in SecurityDistributedTest so that we can oversteer credentials used in inter-solr-node requests from the test. Please revert to Boolean.getBoolean. Ad) CHANGES.txt ... The feature propagates credentials from outer request to the sub-requests ... I can live with this one, but maybe it should be expressed differently. The thing you describe is only true if you use UseSuperRequestAuthCredentialsSubRequestFactory - you can make other implementations of SubRequestFactory where it is not true. And it is only true for "internal requests that is a direct reaction to a request from the outside". In case of "internal requests that is not triggered by a request from the outside" it can never be true - there are no outer request to propagate credentials from. For such requests, if you use SystemPropertiesAuthCredentialsInternalRequestFactory, you can control basic http credentials through JVM-params, but here you can implement your own solution as well. Just wanted to make it clear. If you want to keep what you wrote, then ok for me.
          Hide
          Jan Høydahl added a comment -

          bq, Boolean.getBoolean was the intention.

          Sorry, too quick there, had no idea that method actually gets a System property for you

          Ad) CHANGES.txt

          Fair enough, re-phrased it somehat

          * SOLR-4470: Support for basic http auth in internal Solr requests. This allows full
            use of basic auth for inter-node communication in a SolrCloud cluster. The feature
            also makes it possible to configure simple authorization. (Per Steffensen via janhoy)
          

          Will upload new patch

          Show
          Jan Høydahl added a comment - bq, Boolean.getBoolean was the intention. Sorry, too quick there, had no idea that method actually gets a System property for you Ad) CHANGES.txt Fair enough, re-phrased it somehat * SOLR-4470: Support for basic http auth in internal Solr requests. This allows full use of basic auth for inter-node communication in a SolrCloud cluster. The feature also makes it possible to configure simple authorization. (Per Steffensen via janhoy) Will upload new patch
          Hide
          Per Steffensen added a comment -

          Boolean.getBoolean was the intention.

          Sorry, too quick there, had no idea that method actually gets a System property for you

          That is understandable, I also always get confused

          Show
          Per Steffensen added a comment - Boolean.getBoolean was the intention. Sorry, too quick there, had no idea that method actually gets a System property for you That is understandable, I also always get confused
          Hide
          Jan Høydahl added a comment -

          New patch SOLR-4470.patch fitting on current trunk r1576004

          Any final comments or wishes before commit?

          Show
          Jan Høydahl added a comment - New patch SOLR-4470 .patch fitting on current trunk r1576004 Any final comments or wishes before commit?
          Hide
          Jan Høydahl added a comment - - edited

          (Per Steffensen I did not complete my comment earlier)

          Checking *TODO*s in the patch

          In SystemPropertiesAuthCredentialsInternalRequestFactory

          TODO since internalAuthCredentials is something you use for "internal" requests against other Solr-nodes it should never have different values for different Solr-nodes in the same cluster, and therefore the credentials ought to be specified on a global level (e.g. in ZK) instead of on a "per node" level as VM-params are

          Guess this should not be a TODO here since we're inside the SysProp impl, but rather open a JIRA for a ZK auth cred impl once this gets in

          In SecurityDistributedTest:

          // TODO It ought to have been 403 below instead of -1, but things are just crappy with respect to 403 handling around the code
          doAndAssertSolrExeption(-1 /*403*/, new Callable<Object>() {
          

          Do you remember why you put this comment instead of cleaning up the code?

          /* TODO Seems like the single control-node is sending requests to itself in order to handle get!?
                controlClient.query(params, METHOD.GET, SEARCH_CREDENTIALS);*/
          

          Dead code, better remove it, or is there something to clarify?

          // TODO: REMOVE THIS SLEEP WHEN WE HAVE COLLECTION API RESPONSES
          Thread.sleep(10000);
          

          There is SOLR-4577 which seems to be fixed already, can we perhaps spin off another JIRA to add a return status from AbstractFullDistribZkTestBase#createCollection() and friends. That way we can avoid adding another 10s of sleep through this test

          In AuthCredentials

          // TODO we ought to test if authMethods is already unmodifiable and not wrap it if it is, but I hope/guess
          // Collections.unmodifiableSet will do that internally - I found no way to test if a Set is unmodifiable
          this.authMethods = Collections.unmodifiableSet(authMethods);
          

          Anyone who knows if this is safe? If so better remove the whole TODO

          Show
          Jan Høydahl added a comment - - edited ( Per Steffensen I did not complete my comment earlier) Checking *TODO*s in the patch In SystemPropertiesAuthCredentialsInternalRequestFactory TODO since internalAuthCredentials is something you use for "internal" requests against other Solr-nodes it should never have different values for different Solr-nodes in the same cluster, and therefore the credentials ought to be specified on a global level (e.g. in ZK) instead of on a "per node" level as VM-params are Guess this should not be a TODO here since we're inside the SysProp impl, but rather open a JIRA for a ZK auth cred impl once this gets in In SecurityDistributedTest: // TODO It ought to have been 403 below instead of -1, but things are just crappy with respect to 403 handling around the code doAndAssertSolrExeption(-1 /*403*/, new Callable< Object >() { Do you remember why you put this comment instead of cleaning up the code? /* TODO Seems like the single control-node is sending requests to itself in order to handle get!? controlClient.query(params, METHOD.GET, SEARCH_CREDENTIALS);*/ Dead code, better remove it, or is there something to clarify? // TODO: REMOVE THIS SLEEP WHEN WE HAVE COLLECTION API RESPONSES Thread .sleep(10000); There is SOLR-4577 which seems to be fixed already, can we perhaps spin off another JIRA to add a return status from AbstractFullDistribZkTestBase#createCollection() and friends. That way we can avoid adding another 10s of sleep through this test In AuthCredentials // TODO we ought to test if authMethods is already unmodifiable and not wrap it if it is, but I hope/guess // Collections.unmodifiableSet will do that internally - I found no way to test if a Set is unmodifiable this .authMethods = Collections.unmodifiableSet(authMethods); Anyone who knows if this is safe? If so better remove the whole TODO
          Hide
          Per Steffensen added a comment -

          About the TODO above. I agree! But it can be a step 2.

          Show
          Per Steffensen added a comment - About the TODO above. I agree! But it can be a step 2.
          Hide
          Jan Høydahl added a comment - - edited

          I think at least we should fix the sleep(10000) as a separate JIRA before committing this patch.
          Created SOLR-5853 for this.

          Show
          Jan Høydahl added a comment - - edited I think at least we should fix the sleep(10000) as a separate JIRA before committing this patch. Created SOLR-5853 for this.
          Hide
          Jan Høydahl added a comment - - edited

          Guess this should not be a TODO here since we're inside the SysProp impl

          I moved the meat of the comment as a JavaDoc on the class itself.

          I found no way to test if a Set is unmodifiable

          Try to modify it and catch UnsupportedOperationException However I find no harm in always wrapping, so I'll just remove the TODO.

          Thread.sleep(10000);

          Since collection API since SOLR-4577 already waits for completion, the 10s sleep can go away right away, and when SOLR-5853 is done, we can also do an extra assert on the response.

          Comments wanted

          Any comments on the doAndAssertSolrExeption(-1 /403/... part or the dead code snippet?

          Other changes I'm doing in the next version of the patch:

          • Moving the solr.xml factory-class config from solr/security/insanely-long-and-complicated-tagname/xxxRequest... to: solr/authentication/subRequestFactory/str[@name='class'], this way it is possible to remember the path Edit: keep it on top level since auth may be wanted outside of solrcloud as well, e.g. replicationHandler
          • Simplifying the basic auth sysProperty names from internalAuthCredentialsBasicAuthUsername and internalAuthCredentialsBasicAuthPassword to solr.auth.user and solr.auth.pass
          • Randomizing BaseDistributedSearchTestCase.RUN_WITH_COMMON_SECURITY in @BeforeClass, but it will of course still always be active for the tests in SecurityDistributedTest
          Show
          Jan Høydahl added a comment - - edited Guess this should not be a TODO here since we're inside the SysProp impl I moved the meat of the comment as a JavaDoc on the class itself. I found no way to test if a Set is unmodifiable Try to modify it and catch UnsupportedOperationException However I find no harm in always wrapping, so I'll just remove the TODO. Thread.sleep(10000); Since collection API since SOLR-4577 already waits for completion, the 10s sleep can go away right away, and when SOLR-5853 is done, we can also do an extra assert on the response. Comments wanted Any comments on the doAndAssertSolrExeption(-1 / 403 /... part or the dead code snippet? Other changes I'm doing in the next version of the patch: Moving the solr.xml factory-class config from solr/security/insanely-long-and-complicated-tagname/xxxRequest... to: solr/authentication/subRequestFactory/str[@name='class'] , this way it is possible to remember the path Edit: keep it on top level since auth may be wanted outside of solrcloud as well, e.g. replicationHandler Simplifying the basic auth sysProperty names from internalAuthCredentialsBasicAuthUsername and internalAuthCredentialsBasicAuthPassword to solr.auth.user and solr.auth.pass Randomizing BaseDistributedSearchTestCase.RUN_WITH_COMMON_SECURITY in @BeforeClass, but it will of course still always be active for the tests in SecurityDistributedTest
          Hide
          Per Steffensen added a comment -

          Sorry for being so slow getting back to you, but I am very very busy right now.

          // TODO It ought to have been 403 below instead of -1, but things are just crappy with respect to 403 handling around the code
          doAndAssertSolrExeption(-1 /*403*/, new Callable<Object>() {
          

          Do you remember why you put this comment instead of cleaning up the code?

          It ought to be 403 because 403 is the code we use for unauthroized. The reason it is not propagated all the way through to the client (and we therefore cannot assert it), is that LBHttpSolrServer retries on 403 (RETRY_CODES.add(403)). When doing retries the internal exceptions are not propagated correctly. Why LBHttpSolrServer wants to do retry on 403 is beyond me, but there is probably a good reason. If there is a good reason then the fix should be make sure internal exceptions are propagated correctly even on retries. If there is not good reason I guess "RETRY_CODES.add(403)" could be removed from LBHttpSolrServer - if you do that you can change the from -1's to 403's in SecurityDistributedTest and it will be green. I did not dare to go struggle with propagating exceptions correctly under this issue/ticket, so I just accept for now that we cannot make a really good assert on error-code in SecurityDistributedTest in this case.

          /* TODO Seems like the single control-node is sending requests to itself in order to handle get!?
          controlClient.query(params, METHOD.GET, SEARCH_CREDENTIALS);*/
          

          Dead code, better remove it, or is there something to clarify?

          One of the general ideas throughout SecurityDistributedTest is to show that errors occur when wrong credentials are used in inter-Solr-node-requests. In each case I demonstrate that even though there will be sent wrong credentials in inter-Solr-node-requests we can still carry out the operation on controlClient, because it is not part of a distributed setup (it will never send inter-Solr-node-requests). This pattern is used throughout the test, but it did not work for real-time gets, because controlClient got unauth exceptions - concluded that this was because a Solr-node would always handle real-time gets by forwarding the request to the node supposed to hold the document - ALSO if this node is the node receiving the request and doing the forwarding.
          I have just tried it and it seems this is not a problem anymore so the code-lines can actually be activated - do the following change in both cases. Change from

          /* TODO Seems like the single control-node is sending requests to itself in order to handle get!?
          controlClient.query(params, METHOD.GET, SEARCH_CREDENTIALS);*/
          

          to

          controlClient.query(getParams, METHOD.GET, SEARCH_CREDENTIALS);
          

          But this might not work on 4.x branch when backporting. I do not know when this forwarding-real-time-gets-to-yourself issue was fixed and whether it is on branch_4x.

          Show
          Per Steffensen added a comment - Sorry for being so slow getting back to you, but I am very very busy right now. // TODO It ought to have been 403 below instead of -1, but things are just crappy with respect to 403 handling around the code doAndAssertSolrExeption(-1 /*403*/, new Callable< Object >() { Do you remember why you put this comment instead of cleaning up the code? It ought to be 403 because 403 is the code we use for unauthroized. The reason it is not propagated all the way through to the client (and we therefore cannot assert it), is that LBHttpSolrServer retries on 403 (RETRY_CODES.add(403)). When doing retries the internal exceptions are not propagated correctly. Why LBHttpSolrServer wants to do retry on 403 is beyond me, but there is probably a good reason. If there is a good reason then the fix should be make sure internal exceptions are propagated correctly even on retries. If there is not good reason I guess "RETRY_CODES.add(403)" could be removed from LBHttpSolrServer - if you do that you can change the from -1's to 403's in SecurityDistributedTest and it will be green. I did not dare to go struggle with propagating exceptions correctly under this issue/ticket, so I just accept for now that we cannot make a really good assert on error-code in SecurityDistributedTest in this case. /* TODO Seems like the single control-node is sending requests to itself in order to handle get!? controlClient.query(params, METHOD.GET, SEARCH_CREDENTIALS);*/ Dead code, better remove it, or is there something to clarify? One of the general ideas throughout SecurityDistributedTest is to show that errors occur when wrong credentials are used in inter-Solr-node-requests. In each case I demonstrate that even though there will be sent wrong credentials in inter-Solr-node-requests we can still carry out the operation on controlClient, because it is not part of a distributed setup (it will never send inter-Solr-node-requests). This pattern is used throughout the test, but it did not work for real-time gets, because controlClient got unauth exceptions - concluded that this was because a Solr-node would always handle real-time gets by forwarding the request to the node supposed to hold the document - ALSO if this node is the node receiving the request and doing the forwarding. I have just tried it and it seems this is not a problem anymore so the code-lines can actually be activated - do the following change in both cases. Change from /* TODO Seems like the single control-node is sending requests to itself in order to handle get!? controlClient.query(params, METHOD.GET, SEARCH_CREDENTIALS);*/ to controlClient.query(getParams, METHOD.GET, SEARCH_CREDENTIALS); But this might not work on 4.x branch when backporting. I do not know when this forwarding-real-time-gets-to-yourself issue was fixed and whether it is on branch_4x.
          Hide
          David Webster added a comment -

          Been a while since I have commented as we have been busy with a lot of things, including making our own alteration to SOLR to add inter-node security. However, been following this thread all along.

          We could not use the actual patch code here because our developers had already implemented production features requiring the latest GA release. We also had to implement security using our SiteMinder based authentication, not BASIC auth. We took a bit of a different tact than this approach and did 95% of the security work outside the core SOLR code.

          The only class we touched was the HttpClientUtil code, where we added the same basic four lines or so of code this patch does. Everything else happens outside of SOLR core code. We add an external jar of our own to the SOLR war that implements a SiteMinder auth request to a Policy Server using a password out of our CyberArk vault. We attach the SM token as a cookie on the request, then on the receiving side, implemented a new JAAS LoginModule implemented as a Tomcat Valve sitting in front of SOLR. That will assert the token on a new session or if the token does not match the one in the cache. Does an LDAP role lookup, too. Had to make a web.xml change adding the standard login config and security role tags one would for BASIC auth. The info is cached and only makes auth assertion requests on the server side when the token or session expires/changes and the client side only when the server side passes back a 403.

          Extensive load testing indicates a near negligible overhead on SOLR writes or reads.

          Show
          David Webster added a comment - Been a while since I have commented as we have been busy with a lot of things, including making our own alteration to SOLR to add inter-node security. However, been following this thread all along. We could not use the actual patch code here because our developers had already implemented production features requiring the latest GA release. We also had to implement security using our SiteMinder based authentication, not BASIC auth. We took a bit of a different tact than this approach and did 95% of the security work outside the core SOLR code. The only class we touched was the HttpClientUtil code, where we added the same basic four lines or so of code this patch does. Everything else happens outside of SOLR core code. We add an external jar of our own to the SOLR war that implements a SiteMinder auth request to a Policy Server using a password out of our CyberArk vault. We attach the SM token as a cookie on the request, then on the receiving side, implemented a new JAAS LoginModule implemented as a Tomcat Valve sitting in front of SOLR. That will assert the token on a new session or if the token does not match the one in the cache. Does an LDAP role lookup, too. Had to make a web.xml change adding the standard login config and security role tags one would for BASIC auth. The info is cached and only makes auth assertion requests on the server side when the token or session expires/changes and the client side only when the server side passes back a 403. Extensive load testing indicates a near negligible overhead on SOLR writes or reads.
          Hide
          Jan Høydahl added a comment -

          Cool. A major part of this patch is actually test code, and providing a framework for plugging in your own custom credentials provider. The auth enforcing code is almost 100% in servlet container, we just need to make sure to pass the right tokens on the requests.

          Please share any comments if you see at this stage that the approach taken in this patch in any way would make it hard to plugin in SM, even if you yourself has taken a different approach for your environment.

          Show
          Jan Høydahl added a comment - Cool. A major part of this patch is actually test code, and providing a framework for plugging in your own custom credentials provider. The auth enforcing code is almost 100% in servlet container, we just need to make sure to pass the right tokens on the requests. Please share any comments if you see at this stage that the approach taken in this patch in any way would make it hard to plugin in SM, even if you yourself has taken a different approach for your environment.
          Hide
          Jan Høydahl added a comment -

          A new version of the patch is uploaded, fits with current trunk r1577444

          Changes in addition to the ones mentioned above:

          • Removed 10s sleep and assert on return value from createCollection (SOLR-5853) - tests pass
          • Re-enabled dead code controlClient.query(getParams, METHOD.GET, SEARCH_CREDENTIALS); - tests pass!
          • Renamed MyConcurrentUpdateSolrServer to ErrorHandlingConcurrentUpdateSolrServer
          • Changed to using diamonds <> in the patch to align with recent changes in trunk

          Currently having a discussion in SOLR-5220 to fix return of true error from LBHttpSolrServer, but that should not hold up this patch.

          We should create JIRA's for the two remaining FIXME's in the patch (SolrCmdDistributor) and replace the comment with a reference to the JIRA, Per Steffensen perhaps you already filed these two?:

            // FIXME Here it is a problem using StreamingSolrServers which uses ConcurrentUpdateSolrServer for its requests. They (currently)
            // do not respond errors back, and users of SolrCmdDistributor actually ought to get errors back, so that they can eventually
            // be reported back to the issuer of the outer request that triggers the SolrCmdDistributor requests.
            // E.g. if you issue a deleteByQuery() from a client you will not get any information back about whether or not it was actually
            // carried out successfully throughout the complete Solr cluster. See workaround in SecurityDistributed.doAndAssertSolrExeptionFromStreamingSolrServer
          

          and

          server = currentSolrServerFactory.createNewClient(url, httpClient, 100, 1, updateExecutor, true,
              req, // FIXME Giving it the req here to use for created errors will not work, because this reg will
                   // be used on errors for all future requests sent to the same URL. Resulting in this first req
                   // for this URL to be resubmitted in SolrCmdDistributor.doRetriesIfNeeded when subsequent
                   // different requests for the same URL fail
              errors);
          
          Show
          Jan Høydahl added a comment - A new version of the patch is uploaded, fits with current trunk r1577444 Changes in addition to the ones mentioned above: Removed 10s sleep and assert on return value from createCollection ( SOLR-5853 ) - tests pass Re-enabled dead code controlClient.query(getParams, METHOD.GET, SEARCH_CREDENTIALS); - tests pass! Renamed MyConcurrentUpdateSolrServer to ErrorHandlingConcurrentUpdateSolrServer Changed to using diamonds <> in the patch to align with recent changes in trunk Currently having a discussion in SOLR-5220 to fix return of true error from LBHttpSolrServer , but that should not hold up this patch. We should create JIRA's for the two remaining FIXME's in the patch ( SolrCmdDistributor ) and replace the comment with a reference to the JIRA, Per Steffensen perhaps you already filed these two?: // FIXME Here it is a problem using StreamingSolrServers which uses ConcurrentUpdateSolrServer for its requests. They (currently) // do not respond errors back, and users of SolrCmdDistributor actually ought to get errors back, so that they can eventually // be reported back to the issuer of the outer request that triggers the SolrCmdDistributor requests. // E.g. if you issue a deleteByQuery() from a client you will not get any information back about whether or not it was actually // carried out successfully throughout the complete Solr cluster. See workaround in SecurityDistributed.doAndAssertSolrExeptionFromStreamingSolrServer and server = currentSolrServerFactory.createNewClient(url, httpClient, 100, 1, updateExecutor, true , req, // FIXME Giving it the req here to use for created errors will not work, because this reg will // be used on errors for all future requests sent to the same URL. Resulting in this first req // for this URL to be resubmitted in SolrCmdDistributor.doRetriesIfNeeded when subsequent // different requests for the same URL fail errors);
          Hide
          Per Steffensen added a comment -

          Keep up the good work, Jan!

          We should create JIRA's for the two remaining FIXME's in the patch (SolrCmdDistributor) and replace the comment with a reference to the JIRA, Per Steffensen perhaps you already filed these two?

          No I did not. I put in the FIXMEs when I fell over the problems during my development. But never got around to file JIRAs. Sorry. I expected that the FIXMEs would make me/us remember - and it worked

          Show
          Per Steffensen added a comment - Keep up the good work, Jan! We should create JIRA's for the two remaining FIXME's in the patch (SolrCmdDistributor) and replace the comment with a reference to the JIRA, Per Steffensen perhaps you already filed these two? No I did not. I put in the FIXMEs when I fell over the problems during my development. But never got around to file JIRAs. Sorry. I expected that the FIXMEs would make me/us remember - and it worked
          Hide
          Jan Høydahl added a comment -

          In SolrServer.java this patch adds the following new public methods:

          public UpdateResponse add(Collection<SolrInputDocument> docs, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException {
          public UpdateResponse addBeans(Collection<?> beans, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException {
          public UpdateResponse add(SolrInputDocument doc, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException {
          public UpdateResponse commit( boolean waitFlush, boolean waitSearcher, boolean softCommit, AuthCredentials authCredentials ) throws SolrServerException, IOException {
          public UpdateResponse optimize(boolean waitFlush, boolean waitSearcher, int maxSegments, AuthCredentials authCredentials ) throws SolrServerException, IOException {
          public UpdateResponse rollback(AuthCredentials authCredentials) throws SolrServerException, IOException {
          public UpdateResponse deleteById(String id, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException {
          public UpdateResponse deleteById(List<String> ids, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException {
          public UpdateResponse deleteByQuery(String query, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException {
          public SolrPingResponse ping(AuthCredentials authCredentials) throws SolrServerException, IOException {
          public QueryResponse query(SolrParams params, METHOD method, AuthCredentials authCredentials) throws SolrServerException {
          public QueryResponse queryAndStreamResponse( SolrParams params, StreamingResponseCallback callback, AuthCredentials authCredentials ) throws SolrServerException, IOException
          

          We discussed it earlier in this issue and I believe we then concluded that we could keep them as protected for now not to inflate the public API surface in a non-scalable way. If no objections, I'll make these protected.

          Also the new patch removes some unrelated changes which were done in a separate JIRA (SOLR-5863).

          Per Steffensen If you find the time to report the FIXME's, that would be great.

          Show
          Jan Høydahl added a comment - In SolrServer.java this patch adds the following new public methods: public UpdateResponse add(Collection<SolrInputDocument> docs, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException { public UpdateResponse addBeans(Collection<?> beans, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException { public UpdateResponse add(SolrInputDocument doc, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException { public UpdateResponse commit( boolean waitFlush, boolean waitSearcher, boolean softCommit, AuthCredentials authCredentials ) throws SolrServerException, IOException { public UpdateResponse optimize(boolean waitFlush, boolean waitSearcher, int maxSegments, AuthCredentials authCredentials ) throws SolrServerException, IOException { public UpdateResponse rollback(AuthCredentials authCredentials) throws SolrServerException, IOException { public UpdateResponse deleteById(String id, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException { public UpdateResponse deleteById(List<String> ids, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException { public UpdateResponse deleteByQuery(String query, int commitWithinMs, AuthCredentials authCredentials) throws SolrServerException, IOException { public SolrPingResponse ping(AuthCredentials authCredentials) throws SolrServerException, IOException { public QueryResponse query(SolrParams params, METHOD method, AuthCredentials authCredentials) throws SolrServerException { public QueryResponse queryAndStreamResponse( SolrParams params, StreamingResponseCallback callback, AuthCredentials authCredentials ) throws SolrServerException, IOException We discussed it earlier in this issue and I believe we then concluded that we could keep them as protected for now not to inflate the public API surface in a non-scalable way. If no objections, I'll make these protected. Also the new patch removes some unrelated changes which were done in a separate JIRA ( SOLR-5863 ). Per Steffensen If you find the time to report the FIXME's, that would be great.
          Hide
          Jan Høydahl added a comment -

          Attached a new patch for r1577540 including making the convenience methods package local and ditto public in relevant Test*SolrServer classes.

          Show
          Jan Høydahl added a comment - Attached a new patch for r1577540 including making the convenience methods package local and ditto public in relevant Test*SolrServer classes.
          Hide
          Jan Høydahl added a comment -

          Once there are JIRAs for the two FIXME's we can change those comments a bit or perhaps fix the root problem first if easy.

          This would be an excellent time to get further feedback from others on the patch, so we can aim for committing some time next week perhaps.

          Show
          Jan Høydahl added a comment - Once there are JIRAs for the two FIXME's we can change those comments a bit or perhaps fix the root problem first if easy. This would be an excellent time to get further feedback from others on the patch, so we can aim for committing some time next week perhaps.
          Hide
          Ryan Ernst added a comment -

          A few questions on the patch (I've been loosely following this issue since last year, but haven't really looked at any patches until now):

          • Why do we need our own custom AuthCredentials interface? Could we use something from JAAS?
          • Do we really need to pass credentials around everywhere? What David Webster described sounds like a much cleaner approach?
          Show
          Ryan Ernst added a comment - A few questions on the patch (I've been loosely following this issue since last year, but haven't really looked at any patches until now): Why do we need our own custom AuthCredentials interface? Could we use something from JAAS? Do we really need to pass credentials around everywhere? What David Webster described sounds like a much cleaner approach?
          Hide
          Jan Høydahl added a comment -

          Why do we need our own custom AuthCredentials interface? Could we use something from JAAS?

          "Public and private credential classes are not part of the core JAAS class library. Any class can represent a credential." (link)

          Do we really need to pass credentials around everywhere? What David Webster described sounds like a much cleaner approach?

          To support pure "allow-all" internal requests we could do without passing auth on as many methods. But this patch also supports propagating creds from outer requests so that the app-server can be setup to e.g. allow queries from user A, updates from user B and collection creation from user C based on their username/password. See https://wiki.apache.org/solr/SolrSecurity for more.

          The patch submitter had this requirement for a concrete customer currently in production with this patch, and in my opinion that's an excellent background for a patch. If there are ways to support the same functionality but reducing the patch footprint that would also be nice of course.

          Show
          Jan Høydahl added a comment - Why do we need our own custom AuthCredentials interface? Could we use something from JAAS? "Public and private credential classes are not part of the core JAAS class library. Any class can represent a credential." ( link ) Do we really need to pass credentials around everywhere? What David Webster described sounds like a much cleaner approach? To support pure "allow-all" internal requests we could do without passing auth on as many methods. But this patch also supports propagating creds from outer requests so that the app-server can be setup to e.g. allow queries from user A, updates from user B and collection creation from user C based on their username/password. See https://wiki.apache.org/solr/SolrSecurity for more. The patch submitter had this requirement for a concrete customer currently in production with this patch, and in my opinion that's an excellent background for a patch. If there are ways to support the same functionality but reducing the patch footprint that would also be nice of course.
          Hide
          David Webster added a comment - - edited

          We took advantage of the fact that SOLR is deployed to Tomcat. Tomcat, being the reference implementation of the servlet standard means you can use JAAS LoginModules, which essentially intercept the request stream before it even gets to the application layer, and exercise authtentication and provide authorization tokens for use in the app layer. That's what we did. The problem is on the Sending side (client). There, actual core code has to be altered to either add existing auth into the stream or perform auth then add it to the stream. That involved four lines of code in the actual SOLR code, couple with a custom auth jar we bind into the WAR.

          Now, if they ever move SOLR out of the Servlet container into a stand alone implementation.....we have a problem with our approach, and have to take this patch's full approach.

          Show
          David Webster added a comment - - edited We took advantage of the fact that SOLR is deployed to Tomcat. Tomcat, being the reference implementation of the servlet standard means you can use JAAS LoginModules, which essentially intercept the request stream before it even gets to the application layer, and exercise authtentication and provide authorization tokens for use in the app layer. That's what we did. The problem is on the Sending side (client). There, actual core code has to be altered to either add existing auth into the stream or perform auth then add it to the stream. That involved four lines of code in the actual SOLR code, couple with a custom auth jar we bind into the WAR. Now, if they ever move SOLR out of the Servlet container into a stand alone implementation.....we have a problem with our approach, and have to take this patch's full approach.
          Hide
          Per Steffensen added a comment -

          Regarding the following FIXME in SolrCmdDistributor

          // FIXME Here it is a problem using StreamingSolrServers which uses ConcurrentUpdateSolrServer for its requests. They (currently)
          // do not respond errors back, and users of SolrCmdDistributor actually ought to get errors back, so that they can eventually
          // be reported back to the issuer of the outer request that triggers the SolrCmdDistributor requests.
          // E.g. if you issue a deleteByQuery() from a client you will not get any information back about whether or not it was actually
          // carried out successfully throughout the complete Solr cluster. See workaround in SecurityDistributed.doAndAssertSolrExeptionFromStreamingSolrServer

          The text is a little misleading. Actually SolrCmdDistributor, StreamingSolrServers and ConcurrentUpdateSolrServers do collect errors and make them available for the components using them. Problem is that DistributedUpdateProcessor.doFinish does not report the errors back to the outside client in case there are more than one error (see comment "// TODO - we may need to tell about more than one error..." in DistributedUpdateProcessor.doFinish. The two places in SecurityDistributedTest that uses doAndAssertSolrExeptionFromStreamingSolrServer expect to get an exception back, but does not because both subrequests made internally by SolrCmdDistributor fails, and therefore DistributedUpdateProcessor does not report back the errors at all. Therefore I made the hack to pick up the exceptions at StreamingSolrServers-level so that I, in SecurityDistributedTest, can actually assert that both inner requests fail. I do not know if there is more to report - I expect, because of the "// TODO - we may need to tell about more than one error..." comment, that this has already been reported. It is a little hard to fix, because you need to create an infrastructure that is able to report back multiple errors to the client. We already have that in our version of Solr (created that infrastructure when we implemented optimistic locking, in order to be able to get e.g. 3 version-conflict-errors back when sending a multiple-document-update including 10 documents for update, where 3 failed and 7 succeeded), but it is a long time since we handed it over to Apache Solr (see SOLR-3382). I guess there is nothing left to report - I have no problem that you just delete this FIXME

          Show
          Per Steffensen added a comment - Regarding the following FIXME in SolrCmdDistributor // FIXME Here it is a problem using StreamingSolrServers which uses ConcurrentUpdateSolrServer for its requests. They (currently) // do not respond errors back, and users of SolrCmdDistributor actually ought to get errors back, so that they can eventually // be reported back to the issuer of the outer request that triggers the SolrCmdDistributor requests. // E.g. if you issue a deleteByQuery() from a client you will not get any information back about whether or not it was actually // carried out successfully throughout the complete Solr cluster. See workaround in SecurityDistributed.doAndAssertSolrExeptionFromStreamingSolrServer The text is a little misleading. Actually SolrCmdDistributor, StreamingSolrServers and ConcurrentUpdateSolrServers do collect errors and make them available for the components using them. Problem is that DistributedUpdateProcessor.doFinish does not report the errors back to the outside client in case there are more than one error (see comment "// TODO - we may need to tell about more than one error..." in DistributedUpdateProcessor.doFinish. The two places in SecurityDistributedTest that uses doAndAssertSolrExeptionFromStreamingSolrServer expect to get an exception back, but does not because both subrequests made internally by SolrCmdDistributor fails, and therefore DistributedUpdateProcessor does not report back the errors at all. Therefore I made the hack to pick up the exceptions at StreamingSolrServers-level so that I, in SecurityDistributedTest, can actually assert that both inner requests fail. I do not know if there is more to report - I expect, because of the "// TODO - we may need to tell about more than one error..." comment, that this has already been reported. It is a little hard to fix, because you need to create an infrastructure that is able to report back multiple errors to the client. We already have that in our version of Solr (created that infrastructure when we implemented optimistic locking, in order to be able to get e.g. 3 version-conflict-errors back when sending a multiple-document-update including 10 documents for update, where 3 failed and 7 succeeded), but it is a long time since we handed it over to Apache Solr (see SOLR-3382 ). I guess there is nothing left to report - I have no problem that you just delete this FIXME
          Hide
          Per Steffensen added a comment -

          Regarding the following FIXME in StreamingSolrServers

          // FIXME Giving it the req here to use for created errors will not work, because this reg will
          // be used on errors for all future requests sent to the same URL. Resulting in this first req
          // for this URL to be resubmitted in SolrCmdDistributor.doRetriesIfNeeded when subsequent
          // different requests for the same URL fail

          See SOLR-5939

          Show
          Per Steffensen added a comment - Regarding the following FIXME in StreamingSolrServers // FIXME Giving it the req here to use for created errors will not work, because this reg will // be used on errors for all future requests sent to the same URL. Resulting in this first req // for this URL to be resubmitted in SolrCmdDistributor.doRetriesIfNeeded when subsequent // different requests for the same URL fail See SOLR-5939
          Hide
          Per Steffensen added a comment - - edited

          Do we really need to pass credentials around everywhere?

          They really are not passed around

          What David Webster described sounds like a much cleaner approach?

          Do you really think so!?! IMHO it sounds much more complicated. The patch to SOLR-4470 is really very simple with respect to changes in non-test code, it will make it very easy to setup addition of credentials to outgoing requests, it does not require any other components than Solr running in your infrastructure and it will not require include and config of any 3rd-party libraries.

          The problem is on the Sending side (client)

          This patch only deals with the sending side. With respect to doing the actual authentication and authorization of ingoing requests it is not handled by Solr (and probably shouldnt). As long as we run in a servlet-container (e.g. Tomcat, but also all other certified servlet-containers) you can set up you security in web.xml and use common LoginModules (or customize your own if you want to).

          That involved four lines of code in the actual SOLR code ...

          This patch is not much more than those four lines, except that we support setting up credentials in solr.xml and to "calculate credentials" given the "super-request". Besides that just thorough testing.

          Now, if they ever move SOLR out of the Servlet container into a stand alone implementation.....we have a problem with our approach, and have to take this patch's full approach.

          If they move Solr out of the servlet-container hopefully they support some other way of setting up protection against ingoing requests. But this patch will not help you. This issue is only about adding credentials to outgoing requests.

          Show
          Per Steffensen added a comment - - edited Do we really need to pass credentials around everywhere? They really are not passed around What David Webster described sounds like a much cleaner approach? Do you really think so!?! IMHO it sounds much more complicated. The patch to SOLR-4470 is really very simple with respect to changes in non-test code, it will make it very easy to setup addition of credentials to outgoing requests, it does not require any other components than Solr running in your infrastructure and it will not require include and config of any 3rd-party libraries. The problem is on the Sending side (client) This patch only deals with the sending side. With respect to doing the actual authentication and authorization of ingoing requests it is not handled by Solr (and probably shouldnt). As long as we run in a servlet-container (e.g. Tomcat, but also all other certified servlet-containers) you can set up you security in web.xml and use common LoginModules (or customize your own if you want to). That involved four lines of code in the actual SOLR code ... This patch is not much more than those four lines, except that we support setting up credentials in solr.xml and to "calculate credentials" given the "super-request". Besides that just thorough testing. Now, if they ever move SOLR out of the Servlet container into a stand alone implementation.....we have a problem with our approach, and have to take this patch's full approach. If they move Solr out of the servlet-container hopefully they support some other way of setting up protection against ingoing requests. But this patch will not help you. This issue is only about adding credentials to outgoing requests.
          Hide
          Per Steffensen added a comment -

          Sorry about the slow response, Jan Høydahl, but I am very busy making sure Solrs do not lose their ZooKeeper connections under high load.

          Show
          Per Steffensen added a comment - Sorry about the slow response, Jan Høydahl , but I am very busy making sure Solrs do not lose their ZooKeeper connections under high load.
          Hide
          Jan Høydahl added a comment -

          I'll be away for a few weeks, feel free to continue to improve the patch while I'm away

          Show
          Jan Høydahl added a comment - I'll be away for a few weeks, feel free to continue to improve the patch while I'm away
          Hide
          David Webster added a comment -

          Just a few clarifying comments. Yes, likewise with this patch, we only did a very minor mod to the sending side. However, we have a fairly complicated way of getting the credentials to place on the outbound side, if they were not valid. That required binding in a custom jar (as opposed to touching additional core SOLR code) containing the logic about how to interact with our SiteMinder and CyberArk infrastructure to get those. Other than that, we made no mods to receive, that's all a Tomcat JAASLoginModule, nothing to do with SOLR.

          Hopefully they rethink their plan of moving to a standalone implementation of some sort, because we are quite confident now we will have little trouble moving from version to version in the future, as the small change to core code is in a place that should never change.

          Show
          David Webster added a comment - Just a few clarifying comments. Yes, likewise with this patch, we only did a very minor mod to the sending side. However, we have a fairly complicated way of getting the credentials to place on the outbound side, if they were not valid. That required binding in a custom jar (as opposed to touching additional core SOLR code) containing the logic about how to interact with our SiteMinder and CyberArk infrastructure to get those. Other than that, we made no mods to receive, that's all a Tomcat JAASLoginModule, nothing to do with SOLR. Hopefully they rethink their plan of moving to a standalone implementation of some sort, because we are quite confident now we will have little trouble moving from version to version in the future, as the small change to core code is in a place that should never change.
          Hide
          Forest Soup added a comment -

          Does anyone have an idea when this will be released? Thanks!

          Show
          Forest Soup added a comment - Does anyone have an idea when this will be released? Thanks!
          Hide
          Per Steffensen added a comment -

          Does anyone have an idea when this will be released? Thanks!

          My best guess is never, unfortunately. But you can build your own version of Solr including it. We do that. I can give you a few hints on how to build your own Solr, if you want. Do not have the time to give you a thorough description. But a few hints I will manage. Let me know. Do not know if it is actually described in details somewhere?

          Show
          Per Steffensen added a comment - Does anyone have an idea when this will be released? Thanks! My best guess is never, unfortunately. But you can build your own version of Solr including it. We do that. I can give you a few hints on how to build your own Solr, if you want. Do not have the time to give you a thorough description. But a few hints I will manage. Let me know. Do not know if it is actually described in details somewhere?
          Hide
          Per Steffensen added a comment -

          But voting for it might help

          Show
          Per Steffensen added a comment - But voting for it might help
          Hide
          Shai Erera added a comment -

          Hi Per. I've skimmed briefly through the history of this issue, but didn't read it thoroughly. I noticed that some of the comments were around the size of the patch. I tend to agree with such comments in general, although sometimes a patch just needs to be big. Reviewing the patch very briefly, I noticed that there are many changes due to the addition of authCredentials to SolrServer API methods (I counted around 600+ lines in the patch, which is ~10%). I have few questions about that:

          • Is it correct that every SolrServer can handle authCredentials? And not e.g. only HttpSolrServer.
          • Is it mandatory to nearly double the API with this extra parameter? I guess this is a general question about usability of the update API. We could probably refactor things to look more friendly, e.g. introduce an UpdateRequest with fluid interface, something like: UpdateRequest.add().doc(doc).commitWithin(-1).authCredentials(creds). Then SolrServer would just have a single .execute() method which takes an UpdateRequest and processes it.
            • This change is unrelated to this issue, but if we do this separately, it might simplify this patch a bit.
            • Even without refactoring though, I wonder if we cannot just tell people who need to set authCredentials to build and UpdateRequest themselves, since I assume this code path won't be that common among our users. Then we don't put the authCredentials in the face of all users.

          On a personal note, even if I wanted to help you get this committed, it's impossible for me to digest and decently review a 6300 lines patch. I think we ought to be able to break it down into smaller manageable commits. For example, can we add basicAuth to Solr/Cloud separately from Solrj? Yes, perhaps it will be more usable for Solrj users, but it doesn't mean it has to go in the same commit.

          And on another note, I share your view about the need to modernize (pretty word for 'refactoring') the Solr API, but I think it will be accepted better if we do this piecemeal. For instance, if we want to improve SolrServer API (by adding this fluid UpdateRequest), we can start with writing UpdateRequest2 with the fluid API that we think it should have. We should even be able to commit it (gradually), since at first it won't be used by the server. We then make a series of commits (this is just an example):

          1. Add UpdateRequest2 with addDocuments support
          2. Add rollback to UpdateRequest2
          3. Add AuthCredentials
          4. Add ....
          5. Add SolrServer.execute(UpdateRequest2)

          After everything works well and is properly tested, and especially after that last commit, we can rip out UpdateRequest in exchange for UpdateRequest2. I personally wish we didn't need to do all the API back-compat, but this should be an easy change – deprecate everything on 5x, remove on trunk and delegate all 5x deprecated APIs to UpdateRequest2. BTW, I don't propose to stick w/ UpdateRequest2, it's just an example .

          Not that I expect you to do all this, but I think if e.g. the API looked like that, then adding authCredentials to Solrj would be much easier, and looked 'cleaner'.

          Show
          Shai Erera added a comment - Hi Per. I've skimmed briefly through the history of this issue, but didn't read it thoroughly. I noticed that some of the comments were around the size of the patch. I tend to agree with such comments in general, although sometimes a patch just needs to be big. Reviewing the patch very briefly, I noticed that there are many changes due to the addition of authCredentials to SolrServer API methods (I counted around 600+ lines in the patch, which is ~10%). I have few questions about that: Is it correct that every SolrServer can handle authCredentials? And not e.g. only HttpSolrServer. Is it mandatory to nearly double the API with this extra parameter? I guess this is a general question about usability of the update API. We could probably refactor things to look more friendly, e.g. introduce an UpdateRequest with fluid interface, something like: UpdateRequest.add().doc(doc).commitWithin(-1).authCredentials(creds) . Then SolrServer would just have a single .execute() method which takes an UpdateRequest and processes it. This change is unrelated to this issue, but if we do this separately, it might simplify this patch a bit. Even without refactoring though, I wonder if we cannot just tell people who need to set authCredentials to build and UpdateRequest themselves, since I assume this code path won't be that common among our users. Then we don't put the authCredentials in the face of all users. On a personal note, even if I wanted to help you get this committed, it's impossible for me to digest and decently review a 6300 lines patch. I think we ought to be able to break it down into smaller manageable commits. For example, can we add basicAuth to Solr/Cloud separately from Solrj? Yes, perhaps it will be more usable for Solrj users, but it doesn't mean it has to go in the same commit. And on another note, I share your view about the need to modernize (pretty word for 'refactoring') the Solr API, but I think it will be accepted better if we do this piecemeal. For instance, if we want to improve SolrServer API (by adding this fluid UpdateRequest), we can start with writing UpdateRequest2 with the fluid API that we think it should have. We should even be able to commit it (gradually), since at first it won't be used by the server. We then make a series of commits (this is just an example): Add UpdateRequest2 with addDocuments support Add rollback to UpdateRequest2 Add AuthCredentials Add .... Add SolrServer.execute(UpdateRequest2) After everything works well and is properly tested, and especially after that last commit, we can rip out UpdateRequest in exchange for UpdateRequest2. I personally wish we didn't need to do all the API back-compat, but this should be an easy change – deprecate everything on 5x, remove on trunk and delegate all 5x deprecated APIs to UpdateRequest2. BTW, I don't propose to stick w/ UpdateRequest2, it's just an example . Not that I expect you to do all this, but I think if e.g. the API looked like that, then adding authCredentials to Solrj would be much easier, and looked 'cleaner'.
          Hide
          Per Steffensen added a comment -

          I noticed that some of the comments were around the size of the patch

          Yes, unfortunately I have provided a significant number of features/patches that are fairly big that never got committed.
          It is not that I really want to do it this way. I would much rather, as I sense its most often done, start out with rough half-working patches, and "design the solution" by adding to and discussing the patches as they evolve.
          Unfortunately that just does not fit very well with the way my customer expects me to work. He tells me (in this case) "I want to protect my Solrs with username/password on HTTP level. You have two weeks to do it". I'd better get started. I make a solution the way I think it should be, and do not have the time to discuss it to much with the rest of you. When I have made a working solution to the customer, I feel obligated to at least try to give it back to Solr - if you want to use Open Source, make sure to (try to) give back whenever you can.

          This way I end up providing a fully functional complete solution. I understand that this is hard to grasp for you guys. But I also believe that I am often mistaken. I am not saying that you just have to commit this without any change, dialog or whatever. I just want you to pretend that we are in the beginning of solving this ticket, and just look at the full solution from me as a suggestion on how to solve it, that at least can be used starting point for a discussion and "design process". After I have made the complete solution my customer is happy, and then I have the time to discuss/design "the real" solution with you guys. But it usually never gets that far - no committer really want to jump into it. Then I just say to myself that I did my duty and forwarded my solution to the community. If no one wants to participate I cannot do much more.

          The thing about the discussion/design process is that it is very CALENDAR-time consuming. It is not that I will spend that much actual time on it - usually you just add a few comments, change the patch a little and wait for comments, while you do other stuff (e.g. implementing the next feature for the customer ). So I will have the time to participate in this "after full solution"-phase of making the "right" solution for Solr.

          Is it correct that every SolrServer can handle authCredentials? And not e.g. only HttpSolrServer

          Yes

          Is it mandatory to nearly double the API with this extra parameter?

          No. Actually I thought Jan Høydahl removed the added API methods

          UpdateRequest.add().doc(doc).commitWithin(-1).authCredentials(creds)

          Yes the builder-approach is IMHO in general a much better approach compared to having numerous methods with all combinations of parameters or having one methods taking all parameters where "special values" (like null, -1 or something) means that you just want default on that one. But I dont see that done in Solr. When in Rome, I guess...

          Then we don't put the authCredentials in the face of all users

          You are probably right

          I think we ought to be able to break it down into smaller manageable commits

          Yes, I am sure that can be done. Seems you have a lot of good ideas on how to cut the cake. In regards to this particular SOLR-4470, I think you are focusing too much on the API changes, though. It is a fairly small part of the patch. So if we want to cut SOLR-4470 into smaller patches we need more cutting that just "API stuff" and "the rest". But in general I like all of your ideas in this area.

          Show
          Per Steffensen added a comment - I noticed that some of the comments were around the size of the patch Yes, unfortunately I have provided a significant number of features/patches that are fairly big that never got committed. It is not that I really want to do it this way. I would much rather, as I sense its most often done, start out with rough half-working patches, and "design the solution" by adding to and discussing the patches as they evolve. Unfortunately that just does not fit very well with the way my customer expects me to work. He tells me (in this case) "I want to protect my Solrs with username/password on HTTP level. You have two weeks to do it". I'd better get started. I make a solution the way I think it should be, and do not have the time to discuss it to much with the rest of you. When I have made a working solution to the customer, I feel obligated to at least try to give it back to Solr - if you want to use Open Source, make sure to (try to) give back whenever you can. This way I end up providing a fully functional complete solution. I understand that this is hard to grasp for you guys. But I also believe that I am often mistaken. I am not saying that you just have to commit this without any change, dialog or whatever. I just want you to pretend that we are in the beginning of solving this ticket, and just look at the full solution from me as a suggestion on how to solve it, that at least can be used starting point for a discussion and "design process". After I have made the complete solution my customer is happy, and then I have the time to discuss/design "the real" solution with you guys. But it usually never gets that far - no committer really want to jump into it. Then I just say to myself that I did my duty and forwarded my solution to the community. If no one wants to participate I cannot do much more. The thing about the discussion/design process is that it is very CALENDAR-time consuming. It is not that I will spend that much actual time on it - usually you just add a few comments, change the patch a little and wait for comments, while you do other stuff (e.g. implementing the next feature for the customer ). So I will have the time to participate in this "after full solution"-phase of making the "right" solution for Solr. Is it correct that every SolrServer can handle authCredentials? And not e.g. only HttpSolrServer Yes Is it mandatory to nearly double the API with this extra parameter? No. Actually I thought Jan Høydahl removed the added API methods UpdateRequest.add().doc(doc).commitWithin(-1).authCredentials(creds) Yes the builder-approach is IMHO in general a much better approach compared to having numerous methods with all combinations of parameters or having one methods taking all parameters where "special values" (like null, -1 or something) means that you just want default on that one. But I dont see that done in Solr. When in Rome, I guess... Then we don't put the authCredentials in the face of all users You are probably right I think we ought to be able to break it down into smaller manageable commits Yes, I am sure that can be done. Seems you have a lot of good ideas on how to cut the cake. In regards to this particular SOLR-4470 , I think you are focusing too much on the API changes, though. It is a fairly small part of the patch. So if we want to cut SOLR-4470 into smaller patches we need more cutting that just "API stuff" and "the rest". But in general I like all of your ideas in this area.
          Hide
          Mark Miller added a comment -

          I personally wish we didn't need to do all the API back-compat

          If you can focus on an API breaks in isolated ways first, the upcoming 5.0 is a perfect time to break.

          Show
          Mark Miller added a comment - I personally wish we didn't need to do all the API back-compat If you can focus on an API breaks in isolated ways first, the upcoming 5.0 is a perfect time to break.
          Hide
          Shai Erera added a comment -

          No. Actually I thought Jan Høydahl removed the added API methods

          Hmm, I downloaded the latest SOLR-4470.patch, which seems to be the latest of all other patches too, dating 14/Mar/14.

          I think you are focusing too much on the API changes, though. It is a fairly small part of the patch.

          You're right . But that's the first thing that caught my eyes - at least 10% of the patch seem to be involving these API changes. So in trying to help understand how can this patch be split into (hopefully much) smaller patches, I saw this as a low hanging fruit.

          When I started my way in this community, I used to think like you - I've done some work, so let me just drop it into JIRA and get a committer to commit it. Not because it's perfect, but because it's "mature" and "product-ready" - we can always iterate and improve afterwards. Over the years I've learned that it doesn't work that way - small patches require less time of a committer to review and approve.

          This is also in general how we develop software. I'm sure you didn't code all of this in a single stroke. And I bet that in your private SVN (or whatever you use), these changes were done over multiple commits. To you, this is one piece of work ready to get in. To the community, it's a giant clob that will be committed without a decent chance to review it thoroughly, and explore alternatives. It's just too overwhelming to think about it from the start, like we would have if you e.g. opened a JIRA and said "hey, I want to add basic-auth to Solr, let's discuss".

          So maybe one suggestion I would make is - don't give up on this patch. Use this issue as an umbrella issue for adding basic-auth to Solr. Tell us "here's the entire piece of work that I've done, now I want to do it in steps (baby steps preferred)". And then open subsequent issues to address different places of the code that involve basic-auth. Even if this work must be committed as a whole to trunk, that's fine. We'll do this work entirely in a branch, but still - multiple issues. That way, each commit gets its proper review. When it's merged to trunk, we're sure of the design, the tests and everything else. I didn't get too deep into it, but I do believe that it can be done on trunk though, in baby steps.

          Remember, basic-auth can exist in the code "hidden", until we feel it's ready. So e.g. it's OK if the first commit only adds basic-auth to UpdateRequest, then to HttpSolrServer, then to CloudSolrServer etc ... they don't all need to support it from the start.

          Show
          Shai Erera added a comment - No. Actually I thought Jan Høydahl removed the added API methods Hmm, I downloaded the latest SOLR-4470 .patch, which seems to be the latest of all other patches too, dating 14/Mar/14. I think you are focusing too much on the API changes, though. It is a fairly small part of the patch. You're right . But that's the first thing that caught my eyes - at least 10% of the patch seem to be involving these API changes. So in trying to help understand how can this patch be split into (hopefully much ) smaller patches, I saw this as a low hanging fruit. When I started my way in this community, I used to think like you - I've done some work, so let me just drop it into JIRA and get a committer to commit it. Not because it's perfect, but because it's "mature" and "product-ready" - we can always iterate and improve afterwards. Over the years I've learned that it doesn't work that way - small patches require less time of a committer to review and approve. This is also in general how we develop software. I'm sure you didn't code all of this in a single stroke. And I bet that in your private SVN (or whatever you use), these changes were done over multiple commits. To you, this is one piece of work ready to get in. To the community, it's a giant clob that will be committed without a decent chance to review it thoroughly, and explore alternatives. It's just too overwhelming to think about it from the start, like we would have if you e.g. opened a JIRA and said "hey, I want to add basic-auth to Solr, let's discuss". So maybe one suggestion I would make is - don't give up on this patch. Use this issue as an umbrella issue for adding basic-auth to Solr. Tell us "here's the entire piece of work that I've done, now I want to do it in steps (baby steps preferred)". And then open subsequent issues to address different places of the code that involve basic-auth. Even if this work must be committed as a whole to trunk, that's fine. We'll do this work entirely in a branch, but still - multiple issues. That way, each commit gets its proper review. When it's merged to trunk, we're sure of the design, the tests and everything else. I didn't get too deep into it, but I do believe that it can be done on trunk though, in baby steps. Remember, basic-auth can exist in the code "hidden", until we feel it's ready. So e.g. it's OK if the first commit only adds basic-auth to UpdateRequest, then to HttpSolrServer, then to CloudSolrServer etc ... they don't all need to support it from the start.
          Hide
          David Webster added a comment -

          Shai, we took a bit of a different approach than Per did. This patch is much more complete and robust that our solution as it deals with securing the server (receiving) side of the SOLR cloud at the API level. Instead of relying on the complete patch version to secure our SOLR cloud we simply added our internal auth mechanism on the client side by modifying four lines of code of the HTTPClient side, and then shelled out all the security on the server side out to a JAAS TomcatLogin module implemented as a Tomcat Valve. We were able to do that because SOLR ships as a WAR deployable to a servlet container. They break that model and we are broken....again. Per's patch, I don't believe, would be affected by that architectural change.

          The benefit to us is that now when SOLR releases happen, all we have to do patch is four lines of client side code and we can adopt the body of the rest of the SOLR upgrade, as-is.

          Show
          David Webster added a comment - Shai, we took a bit of a different approach than Per did. This patch is much more complete and robust that our solution as it deals with securing the server (receiving) side of the SOLR cloud at the API level. Instead of relying on the complete patch version to secure our SOLR cloud we simply added our internal auth mechanism on the client side by modifying four lines of code of the HTTPClient side, and then shelled out all the security on the server side out to a JAAS TomcatLogin module implemented as a Tomcat Valve. We were able to do that because SOLR ships as a WAR deployable to a servlet container. They break that model and we are broken....again. Per's patch, I don't believe, would be affected by that architectural change. The benefit to us is that now when SOLR releases happen, all we have to do patch is four lines of client side code and we can adopt the body of the rest of the SOLR upgrade, as-is.
          Hide
          David Webster added a comment -

          In other words, we focused solely on the auth security....this patch is much deeper than just that....

          Show
          David Webster added a comment - In other words, we focused solely on the auth security....this patch is much deeper than just that....
          Hide
          Per Steffensen added a comment - - edited

          Thanks for participating David Webster!!!

          I am not sure you really understand what my patch is about. It ONLY deals with adding credentials to outgoing requests in order to be able to penetrate a security-layer on the server-side. This is necessary if you choose to somehow put a security-layer between the Solr client and the Solr server. This patch has NOTHING to do with setting up that security-layer around the server. Webcontainers do that very well for you. And if Solr really want to move away from being a webapplication that has to run in a webcontainer, there are numerous other frameworks out there that you can use to set up a security-layer around your server. But this patch has nothing to do with how you add that security-layer around your server. You keep talking about JAAS LoginModules etc. and they really are concepts belonging in this security-layer around the server - thats why I believe you do not completely follow my patch.

          Changing the SolrJ client so that it adds "hardcoded" credentials to all outgoing requests is really just a few lines around the HTTPClient-usage that you talk about. In this area, my patch really is not more than those few lines, except that I support doing it at several levels

          • Setting up the credentials once and for all on HTTPClient level, so that it will add those credentials to all requests sent through it
          • Doing the same on SolrJ XXXSolrServer (clients) level - basically doing it on the HTTPClients of the HTTPSolrServers used directly or underneath more complicated XXXSolrServers like CloudSolrServer
          • It also supports setting credentials on request-level, allowing you to use different credentials for different requests sent through the same HTTPClient, without having to change the "global" credentials on HTTPClient level

          But all of this is not that many lines of code

          The thing is, that in a Cloud setup, the Solr server itself is a Solr client, sending requests to other Solr servers. If there is a security-layer between the Solr servers (and there will be if you use e.g. webcontainer managed security) you need to somehow tell the Solr servers which credentials to add to its outgoing requests targeting other Solr servers. My patch is a lot about this aspect. You can fairly easily make in a way so that it just always uses the same "hardcoded" credentials on those outgoing requests. But you might want to configure those fixed credentials from the outside. What if you actually want to use different credentials depending on the kind of the outgoing request. What if the outgoing request is actually triggered from an incoming request, do you want to use the credentials from the ingoing request on the outgoing request? E.g. if an outside client sends a distributed query R1 to Solr server A, Solr server A has to send sub-requests R2 to Solr server B - should it copy the credentials from R1 to R2 or should it use configured/hardcoded credentials for R2?
          You might have very different "algorithms" for Solr servers to decide on credentials for the requests it sends to other Solr servers. In some cases the Solr server might need to fetch the credentials from an outside agency, or you might have a handshake/negotiating-step (like SOLR-6625) or something. Most of the non-test-code in my patch is about making "the component that calculates credentials for requests going out of Solr servers" pluggable.

          But most of the patch is actually test-code. I tend to want to test my stuff thoroughly. In Solr tests we start actual Solr servers sending real requests to each other. So why not test this entire thing by setting up webcontainer managed security in the embedded jetty-containers running the Solr servers in test. Some of the code joggles those things.

          Show
          Per Steffensen added a comment - - edited Thanks for participating David Webster !!! I am not sure you really understand what my patch is about. It ONLY deals with adding credentials to outgoing requests in order to be able to penetrate a security-layer on the server-side. This is necessary if you choose to somehow put a security-layer between the Solr client and the Solr server. This patch has NOTHING to do with setting up that security-layer around the server. Webcontainers do that very well for you. And if Solr really want to move away from being a webapplication that has to run in a webcontainer, there are numerous other frameworks out there that you can use to set up a security-layer around your server. But this patch has nothing to do with how you add that security-layer around your server. You keep talking about JAAS LoginModules etc. and they really are concepts belonging in this security-layer around the server - thats why I believe you do not completely follow my patch. Changing the SolrJ client so that it adds "hardcoded" credentials to all outgoing requests is really just a few lines around the HTTPClient-usage that you talk about. In this area, my patch really is not more than those few lines, except that I support doing it at several levels Setting up the credentials once and for all on HTTPClient level, so that it will add those credentials to all requests sent through it Doing the same on SolrJ XXXSolrServer (clients) level - basically doing it on the HTTPClients of the HTTPSolrServers used directly or underneath more complicated XXXSolrServers like CloudSolrServer It also supports setting credentials on request-level, allowing you to use different credentials for different requests sent through the same HTTPClient, without having to change the "global" credentials on HTTPClient level But all of this is not that many lines of code The thing is, that in a Cloud setup, the Solr server itself is a Solr client, sending requests to other Solr servers. If there is a security-layer between the Solr servers (and there will be if you use e.g. webcontainer managed security) you need to somehow tell the Solr servers which credentials to add to its outgoing requests targeting other Solr servers. My patch is a lot about this aspect. You can fairly easily make in a way so that it just always uses the same "hardcoded" credentials on those outgoing requests. But you might want to configure those fixed credentials from the outside. What if you actually want to use different credentials depending on the kind of the outgoing request. What if the outgoing request is actually triggered from an incoming request, do you want to use the credentials from the ingoing request on the outgoing request? E.g. if an outside client sends a distributed query R1 to Solr server A, Solr server A has to send sub-requests R2 to Solr server B - should it copy the credentials from R1 to R2 or should it use configured/hardcoded credentials for R2? You might have very different "algorithms" for Solr servers to decide on credentials for the requests it sends to other Solr servers. In some cases the Solr server might need to fetch the credentials from an outside agency, or you might have a handshake/negotiating-step (like SOLR-6625 ) or something. Most of the non-test-code in my patch is about making "the component that calculates credentials for requests going out of Solr servers" pluggable. But most of the patch is actually test-code. I tend to want to test my stuff thoroughly. In Solr tests we start actual Solr servers sending real requests to each other. So why not test this entire thing by setting up webcontainer managed security in the embedded jetty-containers running the Solr servers in test. Some of the code joggles those things.
          Hide
          David Webster added a comment -

          Yes, we evaluated the entire patch set and realized all we needed to implement something meeting our requirements was about four lines of new code at the HTTPClient level, that calls out to another external auth module (an outside agency), when needed, to refresh credentials. Not really "hardcoded" at all, but within the narrow context of the client, I suppose you could call it that. And yes, on the receiving end all security is offloaded to a JAAS module; doable because these nodes are housed in Tomcat. If it's a new request coming into the processing cloud or the token has expired, it gets a new credential for the "outside agency". Rinse and repeat.

          For your example, an outside client sending in a query (R1), that then has to propagate a sub request to R2 to Server B. R1 will hit a "mediator". That mediator will attempt to send the query to Server A. That will hit server A's JAAS module which will reject it. That will kick in a call to the "external agency" to check out a valid credential from the vault, sign it, and try again. This time Server A accepts it. If it needs to send a sub query to Server B, it uses our four lines of mod code to inject the credential, send it, and Server B's JAAS module will assert it. If it passes it goes on down the line. If not (expired or man-in-the-middle attack), it goes back to Server A, where Server A again does the same check out procedure. And so on.

          Four lines of code; everything else externalized in custom code modules that are outside core SOLR. May be a fairly simple use case, but it works for us and most importantly, passes IA and external security Audit muster.

          Show
          David Webster added a comment - Yes, we evaluated the entire patch set and realized all we needed to implement something meeting our requirements was about four lines of new code at the HTTPClient level, that calls out to another external auth module (an outside agency), when needed, to refresh credentials. Not really "hardcoded" at all, but within the narrow context of the client, I suppose you could call it that. And yes, on the receiving end all security is offloaded to a JAAS module; doable because these nodes are housed in Tomcat. If it's a new request coming into the processing cloud or the token has expired, it gets a new credential for the "outside agency". Rinse and repeat. For your example, an outside client sending in a query (R1), that then has to propagate a sub request to R2 to Server B. R1 will hit a "mediator". That mediator will attempt to send the query to Server A. That will hit server A's JAAS module which will reject it. That will kick in a call to the "external agency" to check out a valid credential from the vault, sign it, and try again. This time Server A accepts it. If it needs to send a sub query to Server B, it uses our four lines of mod code to inject the credential, send it, and Server B's JAAS module will assert it. If it passes it goes on down the line. If not (expired or man-in-the-middle attack), it goes back to Server A, where Server A again does the same check out procedure. And so on. Four lines of code; everything else externalized in custom code modules that are outside core SOLR. May be a fairly simple use case, but it works for us and most importantly, passes IA and external security Audit muster.
          Hide
          Per Steffensen added a comment -

          Potentially relating to SOLR-6625. We might want to use the more generic callback functionality for adding credentials (in some cases)?

          Show
          Per Steffensen added a comment - Potentially relating to SOLR-6625 . We might want to use the more generic callback functionality for adding credentials (in some cases)?
          Hide
          David Webster added a comment -

          I think in most use cases the security requirement is relatively simple. We need a mechanism where each cloud node "trusts" the cluster member calling it. The credential being passed around is kind of the "trusted secret" of the cluster members. Users can provide their own mechanism to obtain that credential and to assert it on the front side of each node. We had to modify about four lines of code on the back (client) side to simply allow us to place a credential on the request.

          In our case, we front-end the entire SOLR cluster with a set of "mediators" whose job it is to convert an external, asserted credential into the internal SOLR credential.

          Show
          David Webster added a comment - I think in most use cases the security requirement is relatively simple. We need a mechanism where each cloud node "trusts" the cluster member calling it. The credential being passed around is kind of the "trusted secret" of the cluster members. Users can provide their own mechanism to obtain that credential and to assert it on the front side of each node. We had to modify about four lines of code on the back (client) side to simply allow us to place a credential on the request. In our case, we front-end the entire SOLR cluster with a set of "mediators" whose job it is to convert an external, asserted credential into the internal SOLR credential.
          Hide
          Per Steffensen added a comment -

          So maybe one suggestion I would make is - don't give up on this patch. Use this issue as an umbrella issue for adding basic-auth to Solr. Tell us "here's the entire piece of work that I've done, now I want to do it in steps (baby steps preferred)"

          I am not too happy about this approach in general. I like that only fully done features get committed. If you artificially break a complete feature into small steps AND COMMIT THEM you will temporarily have partly done features. No one says that you will ever get the last parts committed. I see way to much code in Solr that really smells like "we will just do this simple hacky thing here without considering nice code-structure/design and complete features - we will complete, refactor and make it nice later" - but "later" never showed up. One example that I stumbled over just recently is SOLR-5768 - I had to clean it up in SOLR-6795, SOLR-6796, SOLR-6812 and SOLR-6813. Breaking up a complete feature including thorough testing etc. into smaller pieces will have to make us enter that area of partly done stuff. I am afraid that only half of the "subsequent issues" under the "umbrella" will never be committed.

          That said, I am willing to try. But I want to wait to see what SOLR-6625 brings. It might very well establish a platform for solving this SOLR-4470 much more easily.

          Show
          Per Steffensen added a comment - So maybe one suggestion I would make is - don't give up on this patch. Use this issue as an umbrella issue for adding basic-auth to Solr. Tell us "here's the entire piece of work that I've done, now I want to do it in steps (baby steps preferred)" I am not too happy about this approach in general. I like that only fully done features get committed. If you artificially break a complete feature into small steps AND COMMIT THEM you will temporarily have partly done features. No one says that you will ever get the last parts committed. I see way to much code in Solr that really smells like "we will just do this simple hacky thing here without considering nice code-structure/design and complete features - we will complete, refactor and make it nice later" - but "later" never showed up. One example that I stumbled over just recently is SOLR-5768 - I had to clean it up in SOLR-6795 , SOLR-6796 , SOLR-6812 and SOLR-6813 . Breaking up a complete feature including thorough testing etc. into smaller pieces will have to make us enter that area of partly done stuff. I am afraid that only half of the "subsequent issues" under the "umbrella" will never be committed. That said, I am willing to try. But I want to wait to see what SOLR-6625 brings. It might very well establish a platform for solving this SOLR-4470 much more easily.
          Hide
          Kapil Malik added a comment -

          Hi,
          I've a solr cloud installation (Solr 4.10.2) and I want to authenticate requests made via SolrJ client using SolrCloudServer
          (Refer : http://wiki.apache.org/solr/Solrj#Using_with_SolrCloud )

          I followed https://wiki.apache.org/solr/SolrSecurity to add basic authentication by editing webdefault.xml, jetty.xml etc. and when I start the first node, I am able to see the auth prompt on accessing it via web-browser, and login successfully. I am also able to access it via SolrJ client using SolrCloudServer and HttpClientUtil.setBasicAuth((DefaultHttpClient) cloudServer.getLbServer().getHttpClient(), username, password);
          But the 2nd node does not start properly and it gives 401 error
          ERROR org.apache.solr.cloud.RecoveryStrategy – Error while trying to recover. core=mycollection_shard1_replica1:java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected mime type application/octet-stream but got text/html
          <401 error message html>
          at java.util.concurrent.FutureTask.report(FutureTask.java:122)
          at java.util.concurrent.FutureTask.get(FutureTask.java:188)
          at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
          at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
          at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)

          The Solr security wiki relies on SOLR-4470 for inter-solr-node requests (org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.SubRequestFactory etc.)

          Since SOLR-4470 is not implemented yet, is there any way I can secure the calls made to my solr cloud ?
          Or is it impossible to add basic authentication to requests to SolrCloud made via SolrJ client using SolrCloudServer today?

          Show
          Kapil Malik added a comment - Hi, I've a solr cloud installation (Solr 4.10.2) and I want to authenticate requests made via SolrJ client using SolrCloudServer (Refer : http://wiki.apache.org/solr/Solrj#Using_with_SolrCloud ) I followed https://wiki.apache.org/solr/SolrSecurity to add basic authentication by editing webdefault.xml, jetty.xml etc. and when I start the first node, I am able to see the auth prompt on accessing it via web-browser, and login successfully. I am also able to access it via SolrJ client using SolrCloudServer and HttpClientUtil.setBasicAuth((DefaultHttpClient) cloudServer.getLbServer().getHttpClient(), username, password); But the 2nd node does not start properly and it gives 401 error ERROR org.apache.solr.cloud.RecoveryStrategy – Error while trying to recover. core=mycollection_shard1_replica1:java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected mime type application/octet-stream but got text/html <401 error message html> at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235) The Solr security wiki relies on SOLR-4470 for inter-solr-node requests (org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.SubRequestFactory etc.) Since SOLR-4470 is not implemented yet, is there any way I can secure the calls made to my solr cloud ? Or is it impossible to add basic authentication to requests to SolrCloud made via SolrJ client using SolrCloudServer today?
          Hide
          Per Steffensen added a comment -

          As you noticed. You can configure standard web-container protection in web.xml/jetty.xml, which makes Jetty require username/password get access. This means that the requests going in to Solr-nodes (servers) have username/password added. As long as you send requests from an outside SolrJ client (where you control the code) you can add the credentials, and you will get access. But Solr-nodes themselves send requests to other Solr-nodes, and before SOLR-4470 you have no (easy) way to add credentials to those requests - hence they will not succeed. I guess what happens in you case it that recovery needs to send inter-solr-node requests that fail due to no-access. To get a nice and easy way out of this vote for SOLR-4470 (or some other JIRA issue that will make it possible to add credentials to inter-solr-node requests - e.g. SOLR-6625 will add a more general feature (callback) that will enable you to get around it).

          It might not be entirely impossible for you to get around the problem. Guess there several more or less hacky options. E.g.

          • Some 3rd-party tool that intercepts requests between solr-nodes and add credentials
          • You probably have a well known set of IPs on which the solr-nodes run. You cannot do it in standard web-container, but Jetty might support configuring a set of IPs that do not have to authenticate to get access (if that is acceptable to you). Or maybe you can implement your own bean and plug it into Jetty.
          • Guess you can discard using standard web-container security, and implement it all yourself. E.g. in a filter that you add in front of SolrDispatchFilter

          But I believe it is not easy before SOLR-4470.

          Back when I wrote the code for SOLR-4470, I also modified https://wiki.apache.org/solr/SolrSecurity to show how things will work after SOLR-4470. Carefully (trying) to note that this depends on SOLR-4470 being included in the product.

          Show
          Per Steffensen added a comment - As you noticed. You can configure standard web-container protection in web.xml/jetty.xml, which makes Jetty require username/password get access. This means that the requests going in to Solr-nodes (servers) have username/password added. As long as you send requests from an outside SolrJ client (where you control the code) you can add the credentials, and you will get access. But Solr-nodes themselves send requests to other Solr-nodes, and before SOLR-4470 you have no (easy) way to add credentials to those requests - hence they will not succeed. I guess what happens in you case it that recovery needs to send inter-solr-node requests that fail due to no-access. To get a nice and easy way out of this vote for SOLR-4470 (or some other JIRA issue that will make it possible to add credentials to inter-solr-node requests - e.g. SOLR-6625 will add a more general feature (callback) that will enable you to get around it). It might not be entirely impossible for you to get around the problem. Guess there several more or less hacky options. E.g. Some 3rd-party tool that intercepts requests between solr-nodes and add credentials You probably have a well known set of IPs on which the solr-nodes run. You cannot do it in standard web-container, but Jetty might support configuring a set of IPs that do not have to authenticate to get access (if that is acceptable to you). Or maybe you can implement your own bean and plug it into Jetty. Guess you can discard using standard web-container security, and implement it all yourself. E.g. in a filter that you add in front of SolrDispatchFilter But I believe it is not easy before SOLR-4470 . Back when I wrote the code for SOLR-4470 , I also modified https://wiki.apache.org/solr/SolrSecurity to show how things will work after SOLR-4470 . Carefully (trying) to note that this depends on SOLR-4470 being included in the product.
          Hide
          Kapil Malik added a comment -

          Thanks a lot Per Steffensen for quick and informative response !
          Indeed, the Solr security wiki carefully mentions dependency on SOLR-4470 for different scenarios. But it is almost 2 yrs since this ticket was opened. Given that it's a fairly decent feature request (and not exactly a wish-ful fancy), I sincerely hope SOLR-4470 is implemented and checked in asap.
          I will look into your 2nd suggestion, might be most appropriate for my requirement. Thanks again.

          Show
          Kapil Malik added a comment - Thanks a lot Per Steffensen for quick and informative response ! Indeed, the Solr security wiki carefully mentions dependency on SOLR-4470 for different scenarios. But it is almost 2 yrs since this ticket was opened. Given that it's a fairly decent feature request (and not exactly a wish-ful fancy), I sincerely hope SOLR-4470 is implemented and checked in asap. I will look into your 2nd suggestion, might be most appropriate for my requirement. Thanks again.
          Hide
          Per Steffensen added a comment - - edited

          There is a variant of 2nd suggestion, that I just want to mention briefly (I am not good at being brief) - something that we do of other reasons. You might want to divide your set of Solr-nodes into two sub-sets

          • Search-Solrs: Receiving incoming search requests from outside clients. Not containing any data themselves but orchestrating the distributed searches against the Data-Solrs
          • Data-Solrs: Not directly accessible (network-wise) to outside clients

          Setting up web-container security on Search-Solrs only might work, because they do not have to communicate with each other. And it might be enough for you if Data-Solrs are not accessible from outside.
          We have such a setup but not for security reasons - we have web-container security on both Search-Solrs and Data-Solrs. Just wanted to mention it because it also might make things easier for you, and since I know it is possible this is also an option for you. We do the separation because of different hardware requirements for the two kinds of Solr-nodes. Search-Solrs need more memory but almost no disk-space, Data-Solrs need more disk-space but less memory - in our setup, with the amounts and distribution of data we have and for the searches we perform against the Sor-cluster.

          Technically what we do is that we make a "search-collection" containing no data. "search-collection" only has shards on the Search-Solrs. Then we have several "data-collections" (containing the data) only having shards on the Data-Solrs. We have custom (not much code) SearchHandler/SearchComponents (Solr concepts) that basically just calculates the "data-collection" you have to search and forwards the search from outside clients to "search-collection" to those "data-collections". We have an advanced algorithm for calculating the "data-collections" that has to be searches given the concrete search to "search-collection", but you can just always calculate "all data-collections". This way Search-Solrs handle the distributed search, while Data-Solrs handle searches against local shard only. This is kinda the same as putting your own custom gateway in front of your Solr-nodes, and make the Solrs-nodes inaccessible directly from outside clients. We just use Solr itself as this front gateway, taking advantage of all the nice features - high availability (with more than one Search-Solr) etc.

          Clients uses CloudSolrServer, but always make searches against the "search-collection".

          Show
          Per Steffensen added a comment - - edited There is a variant of 2nd suggestion, that I just want to mention briefly (I am not good at being brief) - something that we do of other reasons. You might want to divide your set of Solr-nodes into two sub-sets Search-Solrs: Receiving incoming search requests from outside clients. Not containing any data themselves but orchestrating the distributed searches against the Data-Solrs Data-Solrs: Not directly accessible (network-wise) to outside clients Setting up web-container security on Search-Solrs only might work, because they do not have to communicate with each other. And it might be enough for you if Data-Solrs are not accessible from outside. We have such a setup but not for security reasons - we have web-container security on both Search-Solrs and Data-Solrs. Just wanted to mention it because it also might make things easier for you, and since I know it is possible this is also an option for you. We do the separation because of different hardware requirements for the two kinds of Solr-nodes. Search-Solrs need more memory but almost no disk-space, Data-Solrs need more disk-space but less memory - in our setup, with the amounts and distribution of data we have and for the searches we perform against the Sor-cluster. Technically what we do is that we make a "search-collection" containing no data. "search-collection" only has shards on the Search-Solrs. Then we have several "data-collections" (containing the data) only having shards on the Data-Solrs. We have custom (not much code) SearchHandler/SearchComponents (Solr concepts) that basically just calculates the "data-collection" you have to search and forwards the search from outside clients to "search-collection" to those "data-collections". We have an advanced algorithm for calculating the "data-collections" that has to be searches given the concrete search to "search-collection", but you can just always calculate "all data-collections". This way Search-Solrs handle the distributed search, while Data-Solrs handle searches against local shard only. This is kinda the same as putting your own custom gateway in front of your Solr-nodes, and make the Solrs-nodes inaccessible directly from outside clients. We just use Solr itself as this front gateway, taking advantage of all the nice features - high availability (with more than one Search-Solr) etc. Clients uses CloudSolrServer, but always make searches against the "search-collection".

            People

            • Assignee:
              Jan Høydahl
              Reporter:
              Per Steffensen
            • Votes:
              19 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

              • Created:
                Updated:

                Development