Solr
  1. Solr
  2. SOLR-1895

ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time

    Details

      Description

      I've written an LCF SearchComponent which filters returned results based on access tokens provided by LCF's authority service. The component requires you to configure the appropriate authority service URL base, e.g.:

      <!-- LCF document security enforcement component -->
      <searchComponent name="lcfSecurity" class="LCFSecurityFilter">
      <str name="AuthorityServiceBaseURL">http://localhost:8080/lcf-authority-service</str>
      </searchComponent>

      Also required are the following schema.xml additions:

      <!-- Security fields -->
      <field name="allow_token_document" type="string" indexed="true" stored="false" multiValued="true"/>
      <field name="deny_token_document" type="string" indexed="true" stored="false" multiValued="true"/>
      <field name="allow_token_share" type="string" indexed="true" stored="false" multiValued="true"/>
      <field name="deny_token_share" type="string" indexed="true" stored="false" multiValued="true"/>

      Finally, to tie it into the standard request handler, it seems to need to run last:

      <requestHandler name="standard" class="solr.SearchHandler" default="true">
      <arr name="last-components">
      <str>lcfSecurity</str>
      </arr>
      ...

      I have not set a package for this code. Nor have I been able to get it reviewed by someone as conversant with Solr as I would prefer. It is my hope, however, that this module will become part of the standard Solr 1.5 suite of search components, since that would tie it in with LCF nicely.

      1. LCFSecurityFilter.java
        11 kB
        Karl Wright
      2. LCFSecurityFilter.java
        10 kB
        Karl Wright
      3. LCFSecurityFilter.java
        10 kB
        Karl Wright
      4. LCFSecurityFilter.java
        10 kB
        Karl Wright
      5. SOLR-1895.patch
        26 kB
        Koji Sekiguchi
      6. SOLR-1895.patch
        25 kB
        Karl Wright
      7. SOLR-1895.patch
        25 kB
        Karl Wright
      8. SOLR-1895.patch
        25 kB
        Koji Sekiguchi
      9. SOLR-1895.patch
        24 kB
        Koji Sekiguchi
      10. SOLR-1895.patch
        18 kB
        Karl Wright
      11. SOLR-1895-queries.patch
        61 kB
        Koji Sekiguchi
      12. SOLR-1895-queries.patch
        63 kB
        Karl Wright
      13. SOLR-1895-queries.patch
        63 kB
        Karl Wright
      14. SOLR-1895-queries.patch
        39 kB
        Karl Wright
      15. SOLR-1895-queries.patch
        39 kB
        Karl Wright
      16. SOLR-1895-service-plugin.patch
        33 kB
        Ryan McKinley
      17. SOLR-1895-service-plugin.patch
        33 kB
        Ryan McKinley

        Issue Links

          Activity

          Hide
          Karl Wright added a comment -

          Original grant of source code for LCFSecurityFilter.

          Show
          Karl Wright added a comment - Original grant of source code for LCFSecurityFilter.
          Hide
          Karl Wright added a comment -

          Updated version, using getFilters() and setFilters(), which seems resilient about where it is placed in the execution chain.

          Show
          Karl Wright added a comment - Updated version, using getFilters() and setFilters(), which seems resilient about where it is placed in the execution chain.
          Hide
          Karl Wright added a comment -

          Another revision that explicitly uses ConstantScoreQuery and filters to guarantee no scoring effects (and no max boolean clause problems).

          Show
          Karl Wright added a comment - Another revision that explicitly uses ConstantScoreQuery and filters to guarantee no scoring effects (and no max boolean clause problems).
          Hide
          Karl Wright added a comment -

          One thing I forgot in the original description...

          The plug-in looks for an authenticated user name in the input argument "AuthenticatedUserName", which should be in the form 'user@domain' if this is an AD user. If no such argument is found, the plug-in allows only open documents to be returned (that is, those that have no security whatsoever). Note that the plug-in makes no attempt whatsoever to authenticate the user; that is presumed to take place at another level.

          Show
          Karl Wright added a comment - One thing I forgot in the original description... The plug-in looks for an authenticated user name in the input argument "AuthenticatedUserName", which should be in the form 'user@domain' if this is an AD user. If no such argument is found, the plug-in allows only open documents to be returned (that is, those that have no security whatsoever). Note that the plug-in makes no attempt whatsoever to authenticate the user; that is presumed to take place at another level.
          Hide
          Peter Sturge added a comment -

          It's worth bearing in mind that more than just a username is required in the input in order to ensure secure access. Otherwise, security is compromised simply by guessing (or already knowing) the username of someone with higher privileges.

          For example:
          User Dishwasher has low privileges
          User Admin has high privileges

          When Dishwasher logs in, all he/she has to do is put Admin's name in the input argument, and has now assumed Admin's rights.
          User Admin doesn't need to be logged in for this to happen.

          Show
          Peter Sturge added a comment - It's worth bearing in mind that more than just a username is required in the input in order to ensure secure access. Otherwise, security is compromised simply by guessing (or already knowing) the username of someone with higher privileges. For example: User Dishwasher has low privileges User Admin has high privileges When Dishwasher logs in, all he/she has to do is put Admin's name in the input argument, and has now assumed Admin's rights. User Admin doesn't need to be logged in for this to happen.
          Hide
          Karl Wright added a comment -

          >>>>>>
          Otherwise, security is compromised simply by guessing (or already knowing) the username of someone with higher privileges.
          <<<<<<

          That is obviously true, so anyone setting up such a system would not want the Solr webapp to be accessible by just anyone. The presumption is that the Solr webapp is not the final user interface, and is indeed not accessible to the user at all. This problem is generally addressed by controlling exactly who can connect to the Solr socket.

          But also bear in mind that the following "security holes" exist, and need to be dealt with appropriately:

          (1) The index files themselves contain potentially secure data;
          (2) solrconfig.xml can be changed to point to a spoofed authority service.

          So, in general, locking the box down that runs Solr is the only way to go.

          Show
          Karl Wright added a comment - >>>>>> Otherwise, security is compromised simply by guessing (or already knowing) the username of someone with higher privileges. <<<<<< That is obviously true, so anyone setting up such a system would not want the Solr webapp to be accessible by just anyone. The presumption is that the Solr webapp is not the final user interface, and is indeed not accessible to the user at all. This problem is generally addressed by controlling exactly who can connect to the Solr socket. But also bear in mind that the following "security holes" exist, and need to be dealt with appropriately: (1) The index files themselves contain potentially secure data; (2) solrconfig.xml can be changed to point to a spoofed authority service. So, in general, locking the box down that runs Solr is the only way to go.
          Hide
          Peter Sturge added a comment -

          >>>>>>
          The presumption is that the Solr webapp is not the final user interface, and is indeed not accessible to the user at all.
          <<<<<<

          Given that search requests are http-based, how would this be done, in say, an intranet environment? I agree that a user interface wouldn't expose any means to change the http parameters, but if http is available to the UI, it'll also be available to a web browser's search bar at the same station (unless some tunnelling, proxy or similar is used).

          Totally agree on the server lock down - hopefully, everyone does this already as a matter of course!

          There are a couple of ways to address the impersonator problem. Probably the most robust way is to use SSL authentication from client to container, then have the Solr app integrate with the container (like we talked about for the authentication piece) and use its session certificate to ensure that any requests coming from the remote station match those of the originally authenticated user.

          A somewhat easier method is to use the hash and session id mechanism used in SOLR-1872. This provides pgp protection for stopping impersonation (even gaining any access from a browser), but wouldn't be suitable outside of an intranet environment (for exposed internet access, it would really need to be SSL - for sensitive data, though, you wouldn't expect it to be exposed across a DMZ anyway).

          Show
          Peter Sturge added a comment - >>>>>> The presumption is that the Solr webapp is not the final user interface, and is indeed not accessible to the user at all. <<<<<< Given that search requests are http-based, how would this be done, in say, an intranet environment? I agree that a user interface wouldn't expose any means to change the http parameters, but if http is available to the UI, it'll also be available to a web browser's search bar at the same station (unless some tunnelling, proxy or similar is used). Totally agree on the server lock down - hopefully, everyone does this already as a matter of course! There are a couple of ways to address the impersonator problem. Probably the most robust way is to use SSL authentication from client to container, then have the Solr app integrate with the container (like we talked about for the authentication piece) and use its session certificate to ensure that any requests coming from the remote station match those of the originally authenticated user. A somewhat easier method is to use the hash and session id mechanism used in SOLR-1872 . This provides pgp protection for stopping impersonation (even gaining any access from a browser), but wouldn't be suitable outside of an intranet environment (for exposed internet access, it would really need to be SSL - for sensitive data, though, you wouldn't expect it to be exposed across a DMZ anyway).
          Hide
          Karl Wright added a comment -

          >>>>>>
          Given that search requests are http-based, how would this be done, in say, an intranet environment?
          <<<<<<

          The usual way is to configure the application server running solr to either use certificate authentication (which requires the connecting client to be able to identify themselves via a secure cert), or if on a Unix box, configure the application server to not accept connections from (say) anything other than the localhost adapter.

          Show
          Karl Wright added a comment - >>>>>> Given that search requests are http-based, how would this be done, in say, an intranet environment? <<<<<< The usual way is to configure the application server running solr to either use certificate authentication (which requires the connecting client to be able to identify themselves via a secure cert), or if on a Unix box, configure the application server to not accept connections from (say) anything other than the localhost adapter.
          Hide
          Peter Sturge added a comment -

          The usual way is to configure the application server running solr to either use certificate authentication (which requires the connecting client to be able to identify themselves via a secure cert)

          Yes, cert authentication is a good way to go, but once you've got one (because you have at least some privileges), you can by bypass the lower-layer doc security because you've already done the cert auth.

          configure the application server to not accept connections from (say) anything other than the localhost adapter.

          I don't understand how localhost-only would give you any access off the box.
          I guess what I meant was, your client is wherever your client is, and this client could (and probably would) have a web browser installed. If a bona-fide user was an IT Operator, it would be easy for him/her to 'pretend' to be an HR Manager, unless some kind of post-login identity check prevents it.

          One way 'round this is to encrypt part or all of the http parameters (essentially, this is what the hash mechanism does in SOLR-1872).

          Show
          Peter Sturge added a comment - The usual way is to configure the application server running solr to either use certificate authentication (which requires the connecting client to be able to identify themselves via a secure cert) Yes, cert authentication is a good way to go, but once you've got one (because you have at least some privileges), you can by bypass the lower-layer doc security because you've already done the cert auth. configure the application server to not accept connections from (say) anything other than the localhost adapter. I don't understand how localhost-only would give you any access off the box. I guess what I meant was, your client is wherever your client is, and this client could (and probably would) have a web browser installed. If a bona-fide user was an IT Operator, it would be easy for him/her to 'pretend' to be an HR Manager, unless some kind of post-login identity check prevents it. One way 'round this is to encrypt part or all of the http parameters (essentially, this is what the hash mechanism does in SOLR-1872 ).
          Hide
          Karl Wright added a comment -

          >>>>>>
          I don't understand how localhost-only would give you any access off the box.
          <<<<<<

          You would not give any access to the port that Solr is running on off-box. That's the whole point. The presumption would be that another end-user-friendly application would be running on some other port on the same box, and it would talk to solr via the loop-back adapter. To repeat, nothing off-box should be able to get to Solr directly.

          Show
          Karl Wright added a comment - >>>>>> I don't understand how localhost-only would give you any access off the box. <<<<<< You would not give any access to the port that Solr is running on off-box. That's the whole point. The presumption would be that another end-user-friendly application would be running on some other port on the same box, and it would talk to solr via the loop-back adapter. To repeat, nothing off-box should be able to get to Solr directly.
          Hide
          Peter Sturge added a comment -

          Right, ok, so you're using a proxy app to 'shield' the Solr http. Yes, that will work well for user searches. Sorry, I didn't see this app in the jira post.
          For replication, distributed searches, spell checkers etc., I guess these could also go through the proxy app as well - the app would need to support all those mechanisms.

          Show
          Peter Sturge added a comment - Right, ok, so you're using a proxy app to 'shield' the Solr http. Yes, that will work well for user searches. Sorry, I didn't see this app in the jira post. For replication, distributed searches, spell checkers etc., I guess these could also go through the proxy app as well - the app would need to support all those mechanisms.
          Hide
          Karl Wright added a comment -

          There's no "proxy app" in the jira post because I obviously didn't provide an end-user application to go along with Solr. I expect that end-users will not be interacting with Solr directly anyway, since it provides mainly API-level access but not much end-user-friendly access, other than administration, which you probably don't want end-users to be able to do either. And finally, allowing access to the update/delete API to end users is a huge security hole all by itself.

          So, granted there is another application involved, it's going to need to do several things:

          (0) Present itself appropriately in the end-user environment
          (1) Authenticate the end-user
          (2) Submit requests to Solr in proper form

          Show
          Karl Wright added a comment - There's no "proxy app" in the jira post because I obviously didn't provide an end-user application to go along with Solr. I expect that end-users will not be interacting with Solr directly anyway, since it provides mainly API-level access but not much end-user-friendly access, other than administration, which you probably don't want end-users to be able to do either. And finally, allowing access to the update/delete API to end users is a huge security hole all by itself. So, granted there is another application involved, it's going to need to do several things: (0) Present itself appropriately in the end-user environment (1) Authenticate the end-user (2) Submit requests to Solr in proper form
          Hide
          Peter Sturge added a comment -

          That makes total sense to keep a proxy app separate.

          Why wouldn't users interact with Solr directly? There's a lot of client-side stuff available to do just that. I wouldn't have thought there are too many implementations out there that completely block Solr http read access, because this would break replication, distributed searching, spell checkers, custom handlers etc. Generally, web proxies and firewalls etc. do a good job on this side of things, which is one of the reasons doc-level security is such a tricky business - you have to let traffic through and restrict it in solr.war that you would normally not let anywhere near Solr.

          You're right that /update, /admin etc. need to be 'locked-down', but this is quite strightforward, so as not to allow users access to write or change anything.

          Show
          Peter Sturge added a comment - That makes total sense to keep a proxy app separate. Why wouldn't users interact with Solr directly? There's a lot of client-side stuff available to do just that. I wouldn't have thought there are too many implementations out there that completely block Solr http read access, because this would break replication, distributed searching, spell checkers, custom handlers etc. Generally, web proxies and firewalls etc. do a good job on this side of things, which is one of the reasons doc-level security is such a tricky business - you have to let traffic through and restrict it in solr.war that you would normally not let anywhere near Solr. You're right that /update, /admin etc. need to be 'locked-down', but this is quite strightforward, so as not to allow users access to write or change anything.
          Hide
          Anders Rask added a comment -

          Hi Karl!

          This looks very good. But it seems like a lot of double work between this search component and the search component that I develop in SOLR-1834. With my module we would also get a framework for enforcing different security models for different sources.

          I propose this:
          We cooperate to make a binding between my security component and LCF. As I see it I could use this code to implement a security provider (what I call a module that collects groups from a security source e.g. AD) that collects groups through the LCF framework from the underlying sources. And then we can cooperate to implement security models (what I call a module that enforces security in a manner consistent with that of the underlying source) for the sources supported by the LCF framework.

          How do you feel about this?

          Show
          Anders Rask added a comment - Hi Karl! This looks very good. But it seems like a lot of double work between this search component and the search component that I develop in SOLR-1834 . With my module we would also get a framework for enforcing different security models for different sources. I propose this: We cooperate to make a binding between my security component and LCF. As I see it I could use this code to implement a security provider (what I call a module that collects groups from a security source e.g. AD) that collects groups through the LCF framework from the underlying sources. And then we can cooperate to implement security models (what I call a module that enforces security in a manner consistent with that of the underlying source) for the sources supported by the LCF framework. How do you feel about this?
          Hide
          Karl Wright added a comment -

          Hi Anders,

          Indeed, I based some of the code in this ticket on code you had contributed in SOLR-1834.

          If we cooperate, I would suggest that we take the time and effort to understand both SOLR-1834 and LCF, thoroughly. It is not clear from your comments that you are familiar with the LCF security model - which to me seems to have many of the same concepts as your offering, but built as part of an extensible crawling framework. If you want to become more familiar with LCF, I suggest that you start here, and look especially into the "Concepts and Terminology" link.

          http://incubator.apache.org/connectors/developer-resources.html

          I think I have a good idea of the code in SOLR-1834, but obviously I cannot read your intent, and how you would anticipate system integrators make use of this proposal. If you would like to clarify, please provide some use cases (e.g. reasonably detailed scenarios) so that I'm sure we are both on the same page.

          Thanks,
          Karl

          Show
          Karl Wright added a comment - Hi Anders, Indeed, I based some of the code in this ticket on code you had contributed in SOLR-1834 . If we cooperate, I would suggest that we take the time and effort to understand both SOLR-1834 and LCF, thoroughly. It is not clear from your comments that you are familiar with the LCF security model - which to me seems to have many of the same concepts as your offering, but built as part of an extensible crawling framework. If you want to become more familiar with LCF, I suggest that you start here, and look especially into the "Concepts and Terminology" link. http://incubator.apache.org/connectors/developer-resources.html I think I have a good idea of the code in SOLR-1834 , but obviously I cannot read your intent, and how you would anticipate system integrators make use of this proposal. If you would like to clarify, please provide some use cases (e.g. reasonably detailed scenarios) so that I'm sure we are both on the same page. Thanks, Karl
          Hide
          Anders Rask added a comment -

          You are right, it would be beneficial if we first have a clear understanding of both SOLR-1834 and LCF.

          I have read through the links that you gave me and I have some thoughts:

          You are talking about an "Active Directory authorization model", what do you mean by this?
          To my understanding Active Directory is a directory service where you can store certain types of objects for example groups and users, but it is up to the data source how to use these objects in it's security model.
          In NTFS for example; belonging to a group might mean that you get access to a document or that you don't get access to a document because it might be a deny right set on it.
          But on the other hand; in Documentum a group might be used in it's concept of rooms. Stating that a user must first be a member of a certain group to get access to the "room of documents", but must then also be a member of another group to read a certain document in the room.

          This is where my concept of different security models for different sources comes in. For my security component to work you must know what source a document comes from. This source is then correlated to a security model in the solrconfig file. The security model will get the groups from the security provider (which in this case will get them from LCF) and use them in such a way that it emulates the security in the source.

          Does this make it clear what a security model is in the context of SOLR-1834?

          PS
          I should be clear right now and say that the Documentum model in my component is in no way a complete model.

          Show
          Anders Rask added a comment - You are right, it would be beneficial if we first have a clear understanding of both SOLR-1834 and LCF. I have read through the links that you gave me and I have some thoughts: You are talking about an "Active Directory authorization model", what do you mean by this? To my understanding Active Directory is a directory service where you can store certain types of objects for example groups and users, but it is up to the data source how to use these objects in it's security model. In NTFS for example; belonging to a group might mean that you get access to a document or that you don't get access to a document because it might be a deny right set on it. But on the other hand; in Documentum a group might be used in it's concept of rooms. Stating that a user must first be a member of a certain group to get access to the "room of documents", but must then also be a member of another group to read a certain document in the room. This is where my concept of different security models for different sources comes in. For my security component to work you must know what source a document comes from. This source is then correlated to a security model in the solrconfig file. The security model will get the groups from the security provider (which in this case will get them from LCF) and use them in such a way that it emulates the security in the source. Does this make it clear what a security model is in the context of SOLR-1834 ? PS I should be clear right now and say that the Documentum model in my component is in no way a complete model.
          Hide
          Karl Wright added a comment -

          >>>>>>
          You are talking about an "Active Directory authorization model", what do you mean by this?
          <<<<<<

          I meant the combination of a user have user/group SIDs, and files, folders, shares or other entities having access rights based on those SIDs.

          >>>>>>
          ....in Documentum a group might be used in it's concept of rooms...
          <<<<<<

          Yes, of course, this would represent the basic concept of abstraction.

          I understand that SOLR-1834 tries to introduce an abstraction at this level. What I don't understand yet is how this differs from what LCF already provides (and provides in a complete and thoroughly tested manner, for some dozen kinds of repository). I remember that SOLR-1834 uses access-token-based filters to control access, and uses an interface called IRepository to get a user's access tokens, but I don't recall where it gets the access tokens attached to the documents?

          Show
          Karl Wright added a comment - >>>>>> You are talking about an "Active Directory authorization model", what do you mean by this? <<<<<< I meant the combination of a user have user/group SIDs, and files, folders, shares or other entities having access rights based on those SIDs. >>>>>> ....in Documentum a group might be used in it's concept of rooms... <<<<<< Yes, of course, this would represent the basic concept of abstraction. I understand that SOLR-1834 tries to introduce an abstraction at this level. What I don't understand yet is how this differs from what LCF already provides (and provides in a complete and thoroughly tested manner, for some dozen kinds of repository). I remember that SOLR-1834 uses access-token-based filters to control access, and uses an interface called IRepository to get a user's access tokens, but I don't recall where it gets the access tokens attached to the documents?
          Hide
          Karl Wright added a comment -

          Added the ability to work with the LCF mod_authz_annotate Apache2 plugin, as an alternate model. In this model, if the AuthenticatedUserName parameter is not present, the Search Component looks for a UserTokens array of parameters instead.

          Right now, mod_authz_annotate puts the requisite tokens into the AAAGRP header. Since it doesn't appear to be possible for Search Components to be able to get at http request headers directly, it will be up to the user's web application to read the contents of the AAAGRP header and form those into UserTokens parameters when requesting results from Solr.

          Show
          Karl Wright added a comment - Added the ability to work with the LCF mod_authz_annotate Apache2 plugin, as an alternate model. In this model, if the AuthenticatedUserName parameter is not present, the Search Component looks for a UserTokens array of parameters instead. Right now, mod_authz_annotate puts the requisite tokens into the AAAGRP header. Since it doesn't appear to be possible for Search Components to be able to get at http request headers directly, it will be up to the user's web application to read the contents of the AAAGRP header and form those into UserTokens parameters when requesting results from Solr.
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Koji Sekiguchi added a comment -

          I talked with Shinichiro and I was aware of this is awesome!

          Some comments:

          • Can you post a patch-style file rather than java file?
          • Class name should be renamed to something else other than LCF.
          • How about moving it to contrib/security or contrib/auth or somewhere else?
          • Likewise FacetComponent or MoreLikeThisComponent, isn't security=on|off or auth=on|off flag needed?
          • I see there are three while-loop in the attached file. For two of them, I like for-loop rather than while in terms of readability.
          • To support distributed search mode, prepare() method should be executed only on Solr server.
          • How do you think the idea of having a cache for users' access tokens in this SearchComponent. If I remember it correctly, MCF has similar cache in Authority Service, but if Solr has a cache, it helps search performance. Hopefully can turn on|off, set size and expiration time, etc.
          Show
          Koji Sekiguchi added a comment - I talked with Shinichiro and I was aware of this is awesome! Some comments: Can you post a patch-style file rather than java file? Class name should be renamed to something else other than LCF. How about moving it to contrib/security or contrib/auth or somewhere else? Likewise FacetComponent or MoreLikeThisComponent, isn't security=on|off or auth=on|off flag needed? I see there are three while-loop in the attached file. For two of them, I like for-loop rather than while in terms of readability. To support distributed search mode, prepare() method should be executed only on Solr server. How do you think the idea of having a cache for users' access tokens in this SearchComponent. If I remember it correctly, MCF has similar cache in Authority Service, but if Solr has a cache, it helps search performance. Hopefully can turn on|off, set size and expiration time, etc.
          Hide
          Karl Wright added a comment -

          I'll post an updated version that does everything except the caching. My reasoning is that this can be readily added should it prove beneficial. I would want to show that it was helpful before adding the extra configuration complexity.

          Show
          Karl Wright added a comment - I'll post an updated version that does everything except the caching. My reasoning is that this can be readily added should it prove beneficial. I would want to show that it was helpful before adding the extra configuration complexity.
          Hide
          Karl Wright added a comment -

          Ran into a problem - the security filter uses Boolean Filters, which in Lucene 4.0 are now in the contrib module lucene-queries. It's not clear how to modify the contrib's build.xml file to grant access to lucene-queries. So there are two ways forward:

          (1) I might change the logic to use BooleanQuery instead of BooleanFilter, or
          (2) I have to figure out how you're supposed to modify build.xml to specify this dependency.

          Any preferences?

          Show
          Karl Wright added a comment - Ran into a problem - the security filter uses Boolean Filters, which in Lucene 4.0 are now in the contrib module lucene-queries. It's not clear how to modify the contrib's build.xml file to grant access to lucene-queries. So there are two ways forward: (1) I might change the logic to use BooleanQuery instead of BooleanFilter, or (2) I have to figure out how you're supposed to modify build.xml to specify this dependency. Any preferences?
          Hide
          Karl Wright added a comment -

          Uploaded complete patch

          Show
          Karl Wright added a comment - Uploaded complete patch
          Hide
          Jan Høydahl added a comment -

          This is great Karl!

          Show
          Jan Høydahl added a comment - This is great Karl!
          Hide
          Ryan McKinley added a comment -

          What else would go in the 'contrib/auth' contrib?

          Does it really need its own contrib, or just a new package in core?

          Is there really anything specific to ManifoldCF here? Perhaps this could just be AccessTokenSecurityFilter extends SearchComponent?

          Show
          Ryan McKinley added a comment - What else would go in the 'contrib/auth' contrib? Does it really need its own contrib, or just a new package in core? Is there really anything specific to ManifoldCF here? Perhaps this could just be AccessTokenSecurityFilter extends SearchComponent ?
          Hide
          Chris Male added a comment -

          I agree with Ryan.

          This doesn't need to be a contrib, it can just be a package. Solr already depends on the queries module since it contains FunctionQuery. I also like Ryan's suggestion of giving this a more generic name.

          Could we maybe create a 1st class notion of a Token? rather than just passing round Strings everywhere.

          Show
          Chris Male added a comment - I agree with Ryan. This doesn't need to be a contrib, it can just be a package. Solr already depends on the queries module since it contains FunctionQuery. I also like Ryan's suggestion of giving this a more generic name. Could we maybe create a 1st class notion of a Token? rather than just passing round Strings everywhere.
          Hide
          Koji Sekiguchi added a comment -

          Ok, I moved it to core in this patch.

          I added test code, but the following test doesn't pass (I cannot understand why):

            @Test
            public void testUserTokens() throws Exception {
              /* this test doesn't work...???
              assertQ(req("qt", "/mcf", "q", "*:*", "fl", "id", "UserTokens", "token2"),
                  "//*[@numFound='4']",
                  "//result/doc[1]/str[@name='id'][.='d-a12']",
                  "//result/doc[2]/str[@name='id'][.='s-d13']",
                  "//result/doc[3]/str[@name='id'][.='ds-a23-d1']",
                  "//result/doc[4]/str[@name='id'][.='notoken']");
          */
              /* this test doesn't work...???
              assertQ(req("qt", "/mcf", "q", "*:*", "fl", "id", "UserTokens", "token3"),
                  "//*[@numFound='2']",
                  "//result/doc[1]/str[@name='id'][.='ds-a23-d1']",
                  "//result/doc[2]/str[@name='id'][.='notoken']");
          */
              :
            }
          

          In this patch, I also did:

          • remove unused import
          • fix indent
          • use Java5 for loop
          • move Security param (in NamedList) to mcf param (in request parameter resolved at runtime)
          • check isShard param at the beginning of prepare() to support distributed search
          • remove redundant ManifoldCFSecurityFilter in log messages
          • add the filter to example
          • get socket time out parameter from solrconfig.xml
          Show
          Koji Sekiguchi added a comment - Ok, I moved it to core in this patch. I added test code, but the following test doesn't pass (I cannot understand why): @Test public void testUserTokens() throws Exception { /* this test doesn't work...??? assertQ(req( "qt" , "/mcf" , "q" , "*:*" , "fl" , "id" , "UserTokens" , "token2" ), " //*[@numFound='4']" , " //result/doc[1]/str[@name='id'][.='d-a12']" , " //result/doc[2]/str[@name='id'][.='s-d13']" , " //result/doc[3]/str[@name='id'][.='ds-a23-d1']" , " //result/doc[4]/str[@name='id'][.='notoken']" ); */ /* this test doesn't work...??? assertQ(req( "qt" , "/mcf" , "q" , "*:*" , "fl" , "id" , "UserTokens" , "token3" ), " //*[@numFound='2']" , " //result/doc[1]/str[@name='id'][.='ds-a23-d1']" , " //result/doc[2]/str[@name='id'][.='notoken']" ); */ : } In this patch, I also did: remove unused import fix indent use Java5 for loop move Security param (in NamedList) to mcf param (in request parameter resolved at runtime) check isShard param at the beginning of prepare() to support distributed search remove redundant ManifoldCFSecurityFilter in log messages add the filter to example get socket time out parameter from solrconfig.xml
          Hide
          Koji Sekiguchi added a comment -

          I'll post an updated version that does everything except the caching. My reasoning is that this can be readily added should it prove beneficial. I would want to show that it was helpful before adding the extra configuration complexity.

          I agree. Let's open another issue once this is done.

          Show
          Koji Sekiguchi added a comment - I'll post an updated version that does everything except the caching. My reasoning is that this can be readily added should it prove beneficial. I would want to show that it was helpful before adding the extra configuration complexity. I agree. Let's open another issue once this is done.
          Hide
          Koji Sekiguchi added a comment -

          I forgot that I have a question. Can we remove globalAllowed?

          Show
          Koji Sekiguchi added a comment - I forgot that I have a question. Can we remove globalAllowed?
          Hide
          Koji Sekiguchi added a comment -

          Added more tests. Still testUserTokens() doesn't pass (tests are commented out)

          Show
          Koji Sekiguchi added a comment - Added more tests. Still testUserTokens() doesn't pass (tests are commented out)
          Hide
          Koji Sekiguchi added a comment - - edited

          I figured out why the test doesn't pass. In this test, I provided the following documents:

          //          |     share    |   document
          //          |--------------|--------------
          //          | allow | deny | allow | deny
          // ---------+-------+------+-------+------
          // d-a12    |       |      | 1, 2  |
          // ---------+-------+------+-------+------
          // d-a1-d3  |       |      | 1     | 3
          // ---------+-------+------+-------+------
          // s-d13    |       | 1, 3 |       |
          // ---------+-------+------+-------+------
          // ds-a23-d1| 3     | 1    | 2     |
          // ---------+-------+------+-------+------
          // notoken  |       |      |       |
          // ---------+-------+------+-------+------
          

          and when querying "*:*" with UserTokens=token2, I expected that I got d-a12, s-d13, ds-a23-d1 and notoken. But in reality, Solr returns d-a12 and notoken.

          This can be explained as follows:

          1. ManifoldCFSecurityFilter constructs a filter (FS) that finds docs for share part using the following logic in calculateCompleteSubfilter():
            /** Calculate a complete subclause, representing something like:
            * ((fieldAllowShare is empty AND fieldDenyShare is empty) OR fieldAllowShare HAS token1 OR fieldAllowShare HAS token2 ...)
            *     AND fieldDenyShare DOESN'T_HAVE token1 AND fieldDenyShare DOESN'T_HAVE token2 ...
            */
            
          2. As the result of the filter, we got d-a12, d-a1-d3 and notoken (Hmm, I would like to get s-d13 here)
          3. Then ManifoldCFSecurityFilter constructs a filter (FD) that finds docs for document part using the same logic in calculateCompleteSubfilter()
          4. As the result of the filter, we got d-a12, s-d13, ds-a23-d1 and notoken
          5. Finally, ManifoldCFSecurityFilter constructs the final filter using above two filters:
            BooleanFilter bf = new BooleanFilter();
            bf.add(FS,Occur.MUST);
            bf.add(FD,Occur.MUST);
            
          Show
          Koji Sekiguchi added a comment - - edited I figured out why the test doesn't pass. In this test, I provided the following documents: // | share | document // |--------------|-------------- // | allow | deny | allow | deny // ---------+-------+------+-------+------ // d-a12 | | | 1, 2 | // ---------+-------+------+-------+------ // d-a1-d3 | | | 1 | 3 // ---------+-------+------+-------+------ // s-d13 | | 1, 3 | | // ---------+-------+------+-------+------ // ds-a23-d1| 3 | 1 | 2 | // ---------+-------+------+-------+------ // notoken | | | | // ---------+-------+------+-------+------ and when querying "* : *" with UserTokens=token2, I expected that I got d-a12, s-d13, ds-a23-d1 and notoken. But in reality, Solr returns d-a12 and notoken. This can be explained as follows: ManifoldCFSecurityFilter constructs a filter (FS) that finds docs for share part using the following logic in calculateCompleteSubfilter(): /** Calculate a complete subclause, representing something like: * ((fieldAllowShare is empty AND fieldDenyShare is empty) OR fieldAllowShare HAS token1 OR fieldAllowShare HAS token2 ...) * AND fieldDenyShare DOESN'T_HAVE token1 AND fieldDenyShare DOESN'T_HAVE token2 ... */ As the result of the filter, we got d-a12, d-a1-d3 and notoken (Hmm, I would like to get s-d13 here) Then ManifoldCFSecurityFilter constructs a filter (FD) that finds docs for document part using the same logic in calculateCompleteSubfilter() As the result of the filter, we got d-a12, s-d13, ds-a23-d1 and notoken Finally, ManifoldCFSecurityFilter constructs the final filter using above two filters: BooleanFilter bf = new BooleanFilter(); bf.add(FS,Occur.MUST); bf.add(FD,Occur.MUST);
          Hide
          Karl Wright added a comment -

          I think your expectation for s-d13 may be incorrect. If you use AD as a model, you are effectively applying share security that has no allow sids but some deny sids. With AD you would not get this doc either.

          -----Original Message


          From: ext Koji Sekiguchi (JIRA)
          Sent: 17/09/2011, 11:49 PM
          To: dev@lucene.apache.org
          Subject: [jira] [Issue Comment Edited] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time

          [ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107329#comment-13107329 ]

          Koji Sekiguchi edited comment on SOLR-1895 at 9/18/11 3:47 AM:
          ---------------------------------------------------------------

          I figured out why the test doesn't pass. In this test, I provided the following documents:

          //          |     share    |   document
          //          |--------------|--------------
          //          | allow | deny | allow | deny
          // ---------+-------+------+-------+------
          // d-a12    |       |      | 1, 2  |
          // ---------+-------+------+-------+------
          // d-a1-d3  |       |      | 1     | 3
          // ---------+-------+------+-------+------
          // s-d13    |       | 1, 3 |       |
          // ---------+-------+------+-------+------
          // ds-a23-d1| 3     | 1    | 2     |
          // ---------+-------+------+-------+------
          // notoken  |       |      |       |
          // ---------+-------+------+-------+------
          

          and when querying "*:*" with UserTokens=token2, I expected that I got d-a12, s-d13, ds-a23-d1 and notoken. But in reality, Solr returns d-a12 and notoken.

          This can be explained as follows:

          1. ManifoldCFSecurityFilter constructs a filter (FS) that finds docs for share part using the following logic in calculateCompleteSubfilter():
            /** Calculate a complete subclause, representing something like:
            * ((fieldAllowShare is empty AND fieldDenyShare is empty) OR fieldAllowShare HAS token1 OR fieldAllowShare HAS token2 ...)
            *     AND fieldDenyShare DOESN'T_HAVE token1 AND fieldDenyShare DOESN'T_HAVE token2 ...
            */
            
          2. As the result of the filter, we got d-a12, d-a1-d3 and notoken (Hmm, I would like to get s-d13 here)
          3. Then ManifoldCFSecurityFilter constructs a filter (FD) that finds docs for document part using the same logic in calculateCompleteSubfilter()
          4. As the result of the filter, we got d-a12, s-d13, ds-a23-d1 and notoken
          5. Finally, ManifoldCFSecurityFilter constructs the final filter using above two filters:
            BooleanFilter bf = new BooleanFilter();
            bf.add(FS,Occur.MUST);
            bf.add(FD,Occur.MUST);
            

          was (Author: koji):
          I figured out why the test doesn't pass. In this test, I provided the following documents:

          //          |     share    |   document
          //          |--------------|--------------
          //          | allow | deny | allow | deny
          // ---------+-------+------+-------+------
          // d-a12    |       |      | 1, 2  |
          // ---------+-------+------+-------+------
          // d-a1-d3  |       |      | 1     | 3
          // ---------+-------+------+-------+------
          // s-d13    |       | 1, 3 |       |
          // ---------+-------+------+-------+------
          // ds-a23-d1| 3     | 1    | 2     |
          // ---------+-------+------+-------+------
          // notoken  |       |      |       |
          // ---------+-------+------+-------+------
          

          and when querying ":" with UserTokens=token2, I expected that I got d-a12, s-d13, ds-a23-d1 and notoken. But in reality, Solr returns d-a12 and notoken.

          This can be explained as follows:

          1. ManifoldCFSecurityFilter constructs a filter (FS) that finds docs for share part using the following logic in calculateCompleteSubfilter():
            /** Calculate a complete subclause, representing something like:
            * ((fieldAllowShare is empty AND fieldDenyShare is empty) OR fieldAllowShare HAS token1 OR fieldAllowShare HAS token2 ...)
            *     AND fieldDenyShare DOESN'T_HAVE token1 AND fieldDenyShare DOESN'T_HAVE token2 ...
            */
            
          2. As the result of the filter, we got d-a12, d-a1-d3 and notoken (Hmm, I would like to get s-d13 here)
          3. Then ManifoldCFSecurityFilter constructs a filter (FD) that finds docs for document part using the same logic in calculateCompleteSubfilter()
          4. As the result of the filter, we got d-a12, s-d13, ds-a23-d1 and notoken
          5. Finally, ManifoldCFSecurityFilter constructs the final filter using above two filters:
            BooleanFilter bf = new BooleanFilter();
            bf.add(FS,Occur.MUST);
            bf.add(FD,Occur.MUST);
            


          This message is automatically generated by JIRA.
          For more information on JIRA, see: http://www.atlassian.com/software/jira

          ---------------------------------------------------------------------
          To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
          For additional commands, e-mail: dev-help@lucene.apache.org

          Show
          Karl Wright added a comment - I think your expectation for s-d13 may be incorrect. If you use AD as a model, you are effectively applying share security that has no allow sids but some deny sids. With AD you would not get this doc either. -----Original Message From: ext Koji Sekiguchi (JIRA) Sent: 17/09/2011, 11:49 PM To: dev@lucene.apache.org Subject: [jira] [Issue Comment Edited] ( SOLR-1895 ) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time [ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107329#comment-13107329 ] Koji Sekiguchi edited comment on SOLR-1895 at 9/18/11 3:47 AM: --------------------------------------------------------------- I figured out why the test doesn't pass. In this test, I provided the following documents: // | share | document // |--------------|-------------- // | allow | deny | allow | deny // ---------+-------+------+-------+------ // d-a12 | | | 1, 2 | // ---------+-------+------+-------+------ // d-a1-d3 | | | 1 | 3 // ---------+-------+------+-------+------ // s-d13 | | 1, 3 | | // ---------+-------+------+-------+------ // ds-a23-d1| 3 | 1 | 2 | // ---------+-------+------+-------+------ // notoken | | | | // ---------+-------+------+-------+------ and when querying "* : *" with UserTokens=token2, I expected that I got d-a12, s-d13, ds-a23-d1 and notoken. But in reality, Solr returns d-a12 and notoken. This can be explained as follows: ManifoldCFSecurityFilter constructs a filter (FS) that finds docs for share part using the following logic in calculateCompleteSubfilter(): /** Calculate a complete subclause, representing something like: * ((fieldAllowShare is empty AND fieldDenyShare is empty) OR fieldAllowShare HAS token1 OR fieldAllowShare HAS token2 ...) * AND fieldDenyShare DOESN'T_HAVE token1 AND fieldDenyShare DOESN'T_HAVE token2 ... */ As the result of the filter, we got d-a12, d-a1-d3 and notoken (Hmm, I would like to get s-d13 here) Then ManifoldCFSecurityFilter constructs a filter (FD) that finds docs for document part using the same logic in calculateCompleteSubfilter() As the result of the filter, we got d-a12, s-d13, ds-a23-d1 and notoken Finally, ManifoldCFSecurityFilter constructs the final filter using above two filters: BooleanFilter bf = new BooleanFilter(); bf.add(FS,Occur.MUST); bf.add(FD,Occur.MUST); was (Author: koji): I figured out why the test doesn't pass. In this test, I provided the following documents: // | share | document // |--------------|-------------- // | allow | deny | allow | deny // ---------+-------+------+-------+------ // d-a12 | | | 1, 2 | // ---------+-------+------+-------+------ // d-a1-d3 | | | 1 | 3 // ---------+-------+------+-------+------ // s-d13 | | 1, 3 | | // ---------+-------+------+-------+------ // ds-a23-d1| 3 | 1 | 2 | // ---------+-------+------+-------+------ // notoken | | | | // ---------+-------+------+-------+------ and when querying " : " with UserTokens=token2, I expected that I got d-a12, s-d13, ds-a23-d1 and notoken. But in reality, Solr returns d-a12 and notoken. This can be explained as follows: ManifoldCFSecurityFilter constructs a filter (FS) that finds docs for share part using the following logic in calculateCompleteSubfilter(): /** Calculate a complete subclause, representing something like: * ((fieldAllowShare is empty AND fieldDenyShare is empty) OR fieldAllowShare HAS token1 OR fieldAllowShare HAS token2 ...) * AND fieldDenyShare DOESN'T_HAVE token1 AND fieldDenyShare DOESN'T_HAVE token2 ... */ As the result of the filter, we got d-a12, d-a1-d3 and notoken (Hmm, I would like to get s-d13 here) Then ManifoldCFSecurityFilter constructs a filter (FD) that finds docs for document part using the same logic in calculateCompleteSubfilter() As the result of the filter, we got d-a12, s-d13, ds-a23-d1 and notoken Finally, ManifoldCFSecurityFilter constructs the final filter using above two filters: BooleanFilter bf = new BooleanFilter(); bf.add(FS,Occur.MUST); bf.add(FD,Occur.MUST); – This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org
          Hide
          Koji Sekiguchi added a comment -

          Thanks, Karl. Please do not hesitate to modify my patch to go ahead this issue!

          Show
          Koji Sekiguchi added a comment - Thanks, Karl. Please do not hesitate to modify my patch to go ahead this issue!
          Hide
          Karl Wright added a comment -

          Fixed the test

          Show
          Karl Wright added a comment - Fixed the test
          Hide
          Koji Sekiguchi added a comment -

          Thank you for correcting my test, Karl!

          I think I found a mismatch in baforeClass(). The comment says that s-d13 is:

          //          |     share    |   document
          //          |--------------|--------------
          //          | allow | deny | allow | deny
          // ---------+-------+------+-------+------
          // s-d13    | 1,2,3 | 1, 3 |       |
          // ---------+-------+------+-------+------
          

          but the code is:

          assertU(adoc("id", "s-d13",
          "allow_token_document", "token1",
          "allow_token_document", "token2",
          "allow_token_document", "token3",
          "deny_token_share", "token1",
          "deny_token_share", "token3"));
          

          Can you correct them?

          Show
          Koji Sekiguchi added a comment - Thank you for correcting my test, Karl! I think I found a mismatch in baforeClass(). The comment says that s-d13 is: // | share | document // |--------------|-------------- // | allow | deny | allow | deny // ---------+-------+------+-------+------ // s-d13 | 1,2,3 | 1, 3 | | // ---------+-------+------+-------+------ but the code is: assertU(adoc( "id" , "s-d13" , "allow_token_document" , "token1" , "allow_token_document" , "token2" , "allow_token_document" , "token3" , "deny_token_share" , "token1" , "deny_token_share" , "token3" )); Can you correct them?
          Hide
          Karl Wright added a comment -

          New version fixing the inconsistency found by Koji

          Show
          Karl Wright added a comment - New version fixing the inconsistency found by Koji
          Hide
          Koji Sekiguchi added a comment -

          I changed doc ids in test code (my intention in the first patch was that id implies its permissions, and now permissions have been changed, so they should be modified). Also I added allow/deny fields (commented out) in example schema.xml.

          I think this is ready to go!

          Show
          Koji Sekiguchi added a comment - I changed doc ids in test code (my intention in the first patch was that id implies its permissions, and now permissions have been changed, so they should be modified). Also I added allow/deny fields (commented out) in example schema.xml. I think this is ready to go!
          Hide
          Koji Sekiguchi added a comment -

          Is there really anything specific to ManifoldCF here? Perhaps this could just be AccessTokenSecurityFilter extends SearchComponent?

          I think so? I think it is specific MCF and allow/deny token security model provided by AD/Windows.

          Show
          Koji Sekiguchi added a comment - Is there really anything specific to ManifoldCF here? Perhaps this could just be AccessTokenSecurityFilter extends SearchComponent? I think so? I think it is specific MCF and allow/deny token security model provided by AD/Windows.
          Hide
          Chris Male added a comment -

          I think this is ready to go!

          I think we can tidy this up further.

          • Lets dump the constructor since it just calls super()
          • Can we refactor the default manifold URL to a constant?
          • Same with the default timeout period
          • Some LOG.info calls are commented out, lets just delete them. If someone needs them, they can add them in themselves.
          • Is the performance of using BooleanFilter consisting of QueryWrapperFilters and WildcardQueries, really better than just having a BQ? Having fewer levels of indirection when the Queries are executed seems beneficial.
          • Lets dump the process(ResponseBuilder) override, it does nothing.
          • As I earlier commented, can we have a 1st class notion of a SecurityToken? Having just Strings today seems limited

          I think so? I think it is specific MCF and allow/deny token security model provided by AD/Windows.

          I don't really see anything specific to MCF here, apart from the URL. I agree it defines a certain security model but by overriding getAccessTokens, I could source the tokens from anywhere. I could have a plaintext file in my solr installation where I read them from.

          Show
          Chris Male added a comment - I think this is ready to go! I think we can tidy this up further. Lets dump the constructor since it just calls super() Can we refactor the default manifold URL to a constant? Same with the default timeout period Some LOG.info calls are commented out, lets just delete them. If someone needs them, they can add them in themselves. Is the performance of using BooleanFilter consisting of QueryWrapperFilters and WildcardQueries, really better than just having a BQ? Having fewer levels of indirection when the Queries are executed seems beneficial. Lets dump the process(ResponseBuilder) override, it does nothing. As I earlier commented, can we have a 1st class notion of a SecurityToken? Having just Strings today seems limited I think so? I think it is specific MCF and allow/deny token security model provided by AD/Windows. I don't really see anything specific to MCF here, apart from the URL. I agree it defines a certain security model but by overriding getAccessTokens, I could source the tokens from anywhere. I could have a plaintext file in my solr installation where I read them from.
          Hide
          Erik Hatcher added a comment -

          I've just glanced at the patch, so forgive me if I'm off-base, but the examples of solrconfig here show the query component coming before the mcf component. Is that right? Shouldn't mcf come first to set the constraints for the query component's work?

          Also, what about using a PostFilter for these numerous wildcard queries, so that they are evaluated only on docs that match the rest of the query constraints?

          I'm a little weary of adding the MCF "dependency" to Solr core though (yes, I know it doesn't require MCF for compilation or run-time, but depends on MCF's security scheme).

          What about MCF maintaining this filter as a Solr plugin rather than it going into the core of Solr?

          Show
          Erik Hatcher added a comment - I've just glanced at the patch, so forgive me if I'm off-base, but the examples of solrconfig here show the query component coming before the mcf component. Is that right? Shouldn't mcf come first to set the constraints for the query component's work? Also, what about using a PostFilter for these numerous wildcard queries, so that they are evaluated only on docs that match the rest of the query constraints? I'm a little weary of adding the MCF "dependency" to Solr core though (yes, I know it doesn't require MCF for compilation or run-time, but depends on MCF's security scheme). What about MCF maintaining this filter as a Solr plugin rather than it going into the core of Solr?
          Hide
          Koji Sekiguchi added a comment - - edited

          Thank you for reviewing the patch, Chris and Erik! I'll update the patch to incorporate some of your comment.

          For now:

          the examples of solrconfig here show the query component coming before the mcf component. Is that right? Shouldn't mcf come first to set the constraints for the query component's work?

          As the security filter works at prepare phase, this is right.

          I'm a little weary of adding the MCF "dependency" to Solr core though (yes, I know it doesn't require MCF for compilation or run-time, but depends on MCF's security scheme).

          I agree, so I placed it in contrib/auth at first time.

          What about MCF maintaining this filter as a Solr plugin rather than it going into the core of Solr?

          I'd like to hear about it from Karl.

          Show
          Koji Sekiguchi added a comment - - edited Thank you for reviewing the patch, Chris and Erik! I'll update the patch to incorporate some of your comment. For now: the examples of solrconfig here show the query component coming before the mcf component. Is that right? Shouldn't mcf come first to set the constraints for the query component's work? As the security filter works at prepare phase, this is right. I'm a little weary of adding the MCF "dependency" to Solr core though (yes, I know it doesn't require MCF for compilation or run-time, but depends on MCF's security scheme). I agree, so I placed it in contrib/auth at first time. What about MCF maintaining this filter as a Solr plugin rather than it going into the core of Solr? I'd like to hear about it from Karl.
          Hide
          Erik Hatcher added a comment -

          Koji - thanks for pointing out the prepare phase use. It's very odd, to me, to see it that way though. Is there a technical reason this needs to be done in the prepare phase rather than in the process phase?

          Show
          Erik Hatcher added a comment - Koji - thanks for pointing out the prepare phase use. It's very odd, to me, to see it that way though. Is there a technical reason this needs to be done in the prepare phase rather than in the process phase?
          Hide
          Chris Male added a comment -

          I agree with Erik. Security should be the first step in the chain since it may have an impact on any of the later components.

          I kind of do agree that having this in Solr core is a little messy, especially since by default it depends on an external service. Yet at the same time there has been a real thirst from users for some out-of-box security system. At the moment the scope of this component is quite small, but down the line we might want a more comprehensive security system that this would just be part of?

          Show
          Chris Male added a comment - I agree with Erik. Security should be the first step in the chain since it may have an impact on any of the later components. I kind of do agree that having this in Solr core is a little messy, especially since by default it depends on an external service. Yet at the same time there has been a real thirst from users for some out-of-box security system. At the moment the scope of this component is quite small, but down the line we might want a more comprehensive security system that this would just be part of?
          Hide
          Erik Hatcher added a comment -

          So maybe we put in a general SecuritySearchComponent into core that delegates its work to a "SecurityFilterGenerator" plugin that gets looked up through the resource loader mechanism and must be configured in solrconfig (with an out of the box - NoSecurityFilterGenerator/Factory or something like that).

          We want to allow a PostFilter to come into play here too, so looks like we want the security filter generation to return a Query, not a Filter.

          Maybe?

          Show
          Erik Hatcher added a comment - So maybe we put in a general SecuritySearchComponent into core that delegates its work to a "SecurityFilterGenerator" plugin that gets looked up through the resource loader mechanism and must be configured in solrconfig (with an out of the box - NoSecurityFilterGenerator/Factory or something like that). We want to allow a PostFilter to come into play here too, so looks like we want the security filter generation to return a Query, not a Filter. Maybe?
          Hide
          Koji Sekiguchi added a comment - - edited

          Is there a technical reason this needs to be done in the prepare phase rather than in the process phase?

          The idea of this security filter forcibly inserts security Filters before executing query. So I think it is obvious?

          Hmm, so about the order of search components, I think it should be placed at the last, because if it is not at last, theoretically the any last component can modify or remove the inserted security Filters.

          Show
          Koji Sekiguchi added a comment - - edited Is there a technical reason this needs to be done in the prepare phase rather than in the process phase? The idea of this security filter forcibly inserts security Filters before executing query. So I think it is obvious? Hmm, so about the order of search components, I think it should be placed at the last, because if it is not at last, theoretically the any last component can modify or remove the inserted security Filters.
          Hide
          Chris Male added a comment -

          I like what you're suggesting Erik.

          Hmm, so about the order of search components, I think it should be placed at the last, because if it is not at last, theoretically the any last component can modify or remove the inserted security Filters.

          I'm not sure we should fight that. If someone wanted to modify the security Filters they could configure a component to come after the security component. I still feel having it last means we cannot use any information it adds to the Request in latter components.

          Show
          Chris Male added a comment - I like what you're suggesting Erik. Hmm, so about the order of search components, I think it should be placed at the last, because if it is not at last, theoretically the any last component can modify or remove the inserted security Filters. I'm not sure we should fight that. If someone wanted to modify the security Filters they could configure a component to come after the security component. I still feel having it last means we cannot use any information it adds to the Request in latter components.
          Hide
          Karl Wright added a comment -

          We want to allow a PostFilter to come into play here too, so looks like we want the security filter generation to return a Query, not a Filter.

          Maybe?

          I'll let you guys tell me what would work best, but I'm happy to convert to queries from filters if that's what's called for. Just make it clear, and I'll be off...

          Show
          Karl Wright added a comment - We want to allow a PostFilter to come into play here too, so looks like we want the security filter generation to return a Query, not a Filter. Maybe? I'll let you guys tell me what would work best, but I'm happy to convert to queries from filters if that's what's called for. Just make it clear, and I'll be off...
          Hide
          Ryan McKinley added a comment -

          So maybe we put in a general SecuritySearchComponent into core that delegates its work to a "SecurityFilterGenerator" plugin that gets looked up through the resource loader

          +1 for moving:

          protected List<String> getAccessTokens(String authenticatedUserName)
          

          to an interface and loading that dynamically would make this a good general security filter without needing to override the Component

          We want to allow a PostFilter to come into play here too, so looks like we want the security filter generation to return a Query, not a Filter.

          I don't think PostFilter has anything to do with this issue – IIUC, Post filter is a good way to filter results after the query one-by-one. This component adds a Query to the list of filters in the prepare stage. It works in tandem with an indexing strategy that puts corresponding tokens on documents.

          Regarding the order – as long as this Component does its work in the prepare stage, the order does not matter. It just adds a filter to the list of filters.

          Show
          Ryan McKinley added a comment - So maybe we put in a general SecuritySearchComponent into core that delegates its work to a "SecurityFilterGenerator" plugin that gets looked up through the resource loader +1 for moving: protected List< String > getAccessTokens( String authenticatedUserName) to an interface and loading that dynamically would make this a good general security filter without needing to override the Component We want to allow a PostFilter to come into play here too, so looks like we want the security filter generation to return a Query, not a Filter. I don't think PostFilter has anything to do with this issue – IIUC, Post filter is a good way to filter results after the query one-by-one. This component adds a Query to the list of filters in the prepare stage. It works in tandem with an indexing strategy that puts corresponding tokens on documents. Regarding the order – as long as this Component does its work in the prepare stage, the order does not matter. It just adds a filter to the list of filters.
          Hide
          Karl Wright added a comment -

          +1 for moving:

          protected List<String> getAccessTokens(String authenticatedUserName)

          to an interface and loading that dynamically would make this a good general security filter without needing to override the Component

          I can reorganize this and resubmit the patch accordingly. The Search Component name will change, obviously. How about "SecurityFilter", and the ManifoldCF implementation will remain "ManifoldCFSecurityFilter"? Also, there's going to be a bit more than one method in the new interface, because we'll need access to configuration information within the implementing class.

          Show
          Karl Wright added a comment - +1 for moving: protected List< String > getAccessTokens( String authenticatedUserName) to an interface and loading that dynamically would make this a good general security filter without needing to override the Component I can reorganize this and resubmit the patch accordingly. The Search Component name will change, obviously. How about "SecurityFilter", and the ManifoldCF implementation will remain "ManifoldCFSecurityFilter"? Also, there's going to be a bit more than one method in the new interface, because we'll need access to configuration information within the implementing class.
          Hide
          Karl Wright added a comment -

          I can reorganize this and resubmit the patch accordingly. The Search Component name will change, obviously. How about "SecurityFilter", and the ManifoldCF implementation will remain "ManifoldCFSecurityFilter"? Also, there's going to be a bit more than one method in the new interface, because we'll need access to configuration information within the implementing class.

          I just realized that Erik's proposal is somewhat different and seems to involve a broader integration. If that's what winds up being done I'd like a pointer to an existing example so I can stay consistent. Either that or maybe somebody else does this reorg. Also, if we're going to be going back-and-forth like this and pursuing a broader integration, is it reasonable to create a SOLR-1895 branch we can iterate on? Submitting repeated patches is getting painful.

          Show
          Karl Wright added a comment - I can reorganize this and resubmit the patch accordingly. The Search Component name will change, obviously. How about "SecurityFilter", and the ManifoldCF implementation will remain "ManifoldCFSecurityFilter"? Also, there's going to be a bit more than one method in the new interface, because we'll need access to configuration information within the implementing class. I just realized that Erik's proposal is somewhat different and seems to involve a broader integration. If that's what winds up being done I'd like a pointer to an existing example so I can stay consistent. Either that or maybe somebody else does this reorg. Also, if we're going to be going back-and-forth like this and pursuing a broader integration, is it reasonable to create a SOLR-1895 branch we can iterate on? Submitting repeated patches is getting painful.
          Hide
          Erik Hatcher added a comment - - edited

          I don't think PostFilter has anything to do with this issue – IIUC, Post filter is a good way to filter results after the query one-by-one. This component adds a Query to the list of filters in the prepare stage. It works in tandem with an indexing strategy that puts corresponding tokens on documents.

          I guess you're right, as that'd be an implementation detail that would create any post filtering queries using a PostFilter implementation, and then adding wrapping with a QueryWrapperFilter. The MCF filtering doesn't need post filtering... but maybe it'd still be advantageous to leverage the new "cost" capability of filters.

          I just realized that Erik's proposal is somewhat different and seems to involve a broader integration. If that's what winds up being done I'd like a pointer to an existing example so I can stay consistent.

          Yeah, basically like picking a spell check implementation or any of the other subplugins we have within the various SearchComponent's in Solr.

          As for my involvement here - I've got no time in the near future to contribute an implementation of my idea, but should be fairly straightforward to leverage the ideas of how other Solr SearchComponents get their implementation details via named "plugins".

          Show
          Erik Hatcher added a comment - - edited I don't think PostFilter has anything to do with this issue – IIUC, Post filter is a good way to filter results after the query one-by-one. This component adds a Query to the list of filters in the prepare stage. It works in tandem with an indexing strategy that puts corresponding tokens on documents. I guess you're right, as that'd be an implementation detail that would create any post filtering queries using a PostFilter implementation, and then adding wrapping with a QueryWrapperFilter. The MCF filtering doesn't need post filtering... but maybe it'd still be advantageous to leverage the new "cost" capability of filters. I just realized that Erik's proposal is somewhat different and seems to involve a broader integration. If that's what winds up being done I'd like a pointer to an existing example so I can stay consistent. Yeah, basically like picking a spell check implementation or any of the other subplugins we have within the various SearchComponent's in Solr. As for my involvement here - I've got no time in the near future to contribute an implementation of my idea, but should be fairly straightforward to leverage the ideas of how other Solr SearchComponents get their implementation details via named "plugins".
          Hide
          Ryan McKinley added a comment -

          Here is a quick sketch how we could hook into the pluginloader (via SolrCoreAware)

          still needs lots of cleanup, but worth looking at

          Show
          Ryan McKinley added a comment - Here is a quick sketch how we could hook into the pluginloader (via SolrCoreAware) still needs lots of cleanup, but worth looking at
          Hide
          Ryan McKinley added a comment -

          How about "SecurityFilter", and the ManifoldCF implementation will remain "ManifoldCFSecurityFilter"?

          In the thing i just posted, I called it 'AccessTokenSecurityComponent' and it delegates work to a 'AccessTokenService' – the thing that reads tokens from HTTP is 'HttpLookupAccessTokenService', but that should likely be called 'ManifoldAccessTokenService'

          Are there manifold specific things that the Component needs to know about? Assuming all Services work with the 4 field strategy

          Show
          Ryan McKinley added a comment - How about "SecurityFilter", and the ManifoldCF implementation will remain "ManifoldCFSecurityFilter"? In the thing i just posted, I called it 'AccessTokenSecurityComponent' and it delegates work to a 'AccessTokenService' – the thing that reads tokens from HTTP is 'HttpLookupAccessTokenService', but that should likely be called 'ManifoldAccessTokenService' Are there manifold specific things that the Component needs to know about? Assuming all Services work with the 4 field strategy
          Hide
          Karl Wright added a comment -

          Are there manifold specific things that the Component needs to know about? Assuming all Services work with the 4 field strategy

          Sure, the ManifoldCF implementation would need to know the URL of the ManifoldCF authority service to connect to. Other implementations presumably would not?

          Show
          Karl Wright added a comment - Are there manifold specific things that the Component needs to know about? Assuming all Services work with the 4 field strategy Sure, the ManifoldCF implementation would need to know the URL of the ManifoldCF authority service to connect to. Other implementations presumably would not?
          Hide
          Ryan McKinley added a comment -

          in the patch, each service gets passed the NamedList params, so it can pull out whatever it wants. The base component just calls:

            @Override
            public void inform(SolrCore core) {
              service = (AccessTokenService)core.getResourceLoader().newInstance(serviceClassName);
              service.init(args);
            }
          
          Show
          Ryan McKinley added a comment - in the patch, each service gets passed the NamedList params, so it can pull out whatever it wants. The base component just calls: @Override public void inform(SolrCore core) { service = (AccessTokenService)core.getResourceLoader().newInstance(serviceClassName); service.init(args); }
          Hide
          Karl Wright added a comment -

          The service patch looks good - the only thing I would change is the HttpLookup... name to ManifoldCFLookup... or some such, as you said, and the component name probably should be something other than "mcf".

          Show
          Karl Wright added a comment - The service patch looks good - the only thing I would change is the HttpLookup... name to ManifoldCFLookup... or some such, as you said, and the component name probably should be something other than "mcf".
          Hide
          Ryan McKinley added a comment -

          What are thoughts on requiring a parameter to enable security?

              SolrParams params = rb.req.getParams();
              if (!params.getBool(COMPONENT_NAME, true) || params.getBool(ShardParams.IS_SHARD, false))
                return;
          

          I know we can have invariants param, but i don't like that this makes it hard to use the servlet container for authentication. We may want some users to be able to disable the component but not all.

          Is it a big problem if we remove the boolean check for component name?

          In a similar vein, i changed things so we get the username with:

            protected String getAuthenticatedUserName(ResponseBuilder rb) {
              return rb.req.getParams().get(AUTHENTICATED_USER_NAME);
            }
          

          This lets an overridden Component give you the username – potentially from the servlet container

          Show
          Ryan McKinley added a comment - What are thoughts on requiring a parameter to enable security? SolrParams params = rb.req.getParams(); if (!params.getBool(COMPONENT_NAME, true ) || params.getBool(ShardParams.IS_SHARD, false )) return ; I know we can have invariants param, but i don't like that this makes it hard to use the servlet container for authentication. We may want some users to be able to disable the component but not all. Is it a big problem if we remove the boolean check for component name? In a similar vein, i changed things so we get the username with: protected String getAuthenticatedUserName(ResponseBuilder rb) { return rb.req.getParams().get(AUTHENTICATED_USER_NAME); } This lets an overridden Component give you the username – potentially from the servlet container
          Hide
          Ryan McKinley added a comment -

          updated patch with name changes, and dropping the boolena parameter.

          Rather then add this to the example config/schema I think it makes more sense to just document well since it will take integration with other systems to actually be useful

          thoughts? maybe close?

          Show
          Ryan McKinley added a comment - updated patch with name changes, and dropping the boolena parameter. Rather then add this to the example config/schema I think it makes more sense to just document well since it will take integration with other systems to actually be useful thoughts? maybe close?
          Hide
          Karl Wright added a comment -

          REALLY close.

          Only thing I saw that was a bit weird was this, which is part of the test infrastructure:

          + <!-- test AccessToken Security Filter settings -->
          + <searchComponent name="mcf-param" class="org.apache.solr.handler.auth.AccessTokenSecurityComponent" >
          + <str name="AccessTokenService">org.apache.solr.handler.auth.ManifoldCFAccessTokenService</str>
          + <str name="AuthorityServiceBaseURL">http://localhost:8345/mcf-as</str>
          + <int name="SocketTimeOut">3000</int>
          + <str name="AllowAttributePrefix">aap-</str>
          + <str name="DenyAttributePrefix">dap-</str>
          + </searchComponent>

          Two of the settings don't apply to the AccessTokenSecurityComponent, just to the ManifoldCF implementation...

          Show
          Karl Wright added a comment - REALLY close. Only thing I saw that was a bit weird was this, which is part of the test infrastructure: + <!-- test AccessToken Security Filter settings --> + <searchComponent name="mcf-param" class="org.apache.solr.handler.auth.AccessTokenSecurityComponent" > + <str name="AccessTokenService">org.apache.solr.handler.auth.ManifoldCFAccessTokenService</str> + <str name="AuthorityServiceBaseURL"> http://localhost:8345/mcf-as </str> + <int name="SocketTimeOut">3000</int> + <str name="AllowAttributePrefix">aap-</str> + <str name="DenyAttributePrefix">dap-</str> + </searchComponent> Two of the settings don't apply to the AccessTokenSecurityComponent, just to the ManifoldCF implementation...
          Hide
          Ryan McKinley added a comment -

          I'm not sure the best way to do this (without rewriting spring!)

          We want to delegate to a runtime loaded AccessTokenService, but be able to configure it with whatever it needs. A good approach would be to have the AccessTokenService defined outside of the component and then the component would use it... but I think that would adds too much complexity.

          The approach i took here is that the same NamedList gets passed to the AccessTokenService and the Component – this is weird because only some settings will apply to each.

          The test checks if the AuthorityServiceBaseURL is actually set on the constructed service.

          Weird, but i think better then other alternatives I could think of. Any ideas/suggestions?

          Show
          Ryan McKinley added a comment - I'm not sure the best way to do this (without rewriting spring!) We want to delegate to a runtime loaded AccessTokenService, but be able to configure it with whatever it needs. A good approach would be to have the AccessTokenService defined outside of the component and then the component would use it... but I think that would adds too much complexity. The approach i took here is that the same NamedList gets passed to the AccessTokenService and the Component – this is weird because only some settings will apply to each. The test checks if the AuthorityServiceBaseURL is actually set on the constructed service. Weird, but i think better then other alternatives I could think of. Any ideas/suggestions?
          Hide
          Mark Miller added a comment -

          I have not looked close enough at this yet, but two concerns I have:

          1. I don't think Solr core should have anything MCF specific in it myself.
          2. Getting into the security area is not something we should take lightly.

          Show
          Mark Miller added a comment - I have not looked close enough at this yet, but two concerns I have: 1. I don't think Solr core should have anything MCF specific in it myself. 2. Getting into the security area is not something we should take lightly.
          Hide
          Karl Wright added a comment -

          Weird, but i think better then other alternatives I could think of. Any ideas/suggestions?

          No, I think it's fine actually, especially given the explanation.

          Show
          Karl Wright added a comment - Weird, but i think better then other alternatives I could think of. Any ideas/suggestions? No, I think it's fine actually, especially given the explanation.
          Hide
          Karl Wright added a comment -

          I have not looked close enough at this yet, but two concerns I have:

          Koji originally proposed this as a contrib, which is the patch that I submitted. Others thought it was better suited in the current form and thus now it is in core. Is your objection (a) that it should have remained in contrib, or (b) that it should not be committed at all?

          Show
          Karl Wright added a comment - I have not looked close enough at this yet, but two concerns I have: Koji originally proposed this as a contrib, which is the patch that I submitted. Others thought it was better suited in the current form and thus now it is in core. Is your objection (a) that it should have remained in contrib, or (b) that it should not be committed at all?
          Hide
          Mark Miller added a comment -

          (a) that it should have remained in contrib,

          This depends - if we get any MCF names out of it and its very general, I think core is fine. If leaving in MCF names makes sense, because there are some MCF specific things, I think contrib is the path to take. Or a piece lives in core and MCF* classes are contrib.

          (b) that it should not be committed at all?

          It's too early for me to weigh in on that - but I think getting into security is a tricky business that we really want to debate with a wide group of committers.

          Show
          Mark Miller added a comment - (a) that it should have remained in contrib, This depends - if we get any MCF names out of it and its very general, I think core is fine. If leaving in MCF names makes sense, because there are some MCF specific things, I think contrib is the path to take. Or a piece lives in core and MCF* classes are contrib. (b) that it should not be committed at all? It's too early for me to weigh in on that - but I think getting into security is a tricky business that we really want to debate with a wide group of committers.
          Hide
          Karl Wright added a comment -

          This depends - if we get any MCF names out of it and its very general, I think core is fine. If leaving in MCF names makes sense, because there are some MCF specific things, I think contrib is the path to take. Or a piece lives in core and MCF* classes are contrib.

          That's fine with me. The implementation class should then be moved to some contrib module. Not sure what this means as far as tests are concerned, because it is currently tested as a whole, but I'm sure we could come up with something that would permit separation into two independent tests.

          It's too early for me to weigh in on that - but I think getting into security is a tricky business that we really want to debate with a wide group of committers.

          This has been rattling around for more than a year at this point. How do we involve a wide group of committers given that? Suggestions welcome.

          Show
          Karl Wright added a comment - This depends - if we get any MCF names out of it and its very general, I think core is fine. If leaving in MCF names makes sense, because there are some MCF specific things, I think contrib is the path to take. Or a piece lives in core and MCF* classes are contrib. That's fine with me. The implementation class should then be moved to some contrib module. Not sure what this means as far as tests are concerned, because it is currently tested as a whole, but I'm sure we could come up with something that would permit separation into two independent tests. It's too early for me to weigh in on that - but I think getting into security is a tricky business that we really want to debate with a wide group of committers. This has been rattling around for more than a year at this point. How do we involve a wide group of committers given that? Suggestions welcome.
          Hide
          Mark Miller added a comment -

          How do we involve a wide group of committers given that? Suggestions welcome.

          A lot of times, people don't weigh in until something is about to be committed - this never popped up on my radar before.

          Essentially, either my comment will pop up some other comments from other committers, or lazy consensus will take hold...

          Show
          Mark Miller added a comment - How do we involve a wide group of committers given that? Suggestions welcome. A lot of times, people don't weigh in until something is about to be committed - this never popped up on my radar before. Essentially, either my comment will pop up some other comments from other committers, or lazy consensus will take hold...
          Hide
          Ryan McKinley added a comment -

          I like this patch because it builds a general allow/deny filter matrix that could work for most authentication strategies where you know security info at index time.

          It would be good to have a general starting place to implement security somewhere in the solr codebase, though i don't see solr getting a full security stack.

          re packaging: i'm fine with contrib or core – contrib seems heavyweight for just this class. But as we look at things like SOLR-1834 and SOLR-1872 it would be good to have them in a different module. Maybe we should just go ahead and start in a module so it is easier to modify in the future.

          Show
          Ryan McKinley added a comment - I like this patch because it builds a general allow/deny filter matrix that could work for most authentication strategies where you know security info at index time. It would be good to have a general starting place to implement security somewhere in the solr codebase, though i don't see solr getting a full security stack. re packaging: i'm fine with contrib or core – contrib seems heavyweight for just this class. But as we look at things like SOLR-1834 and SOLR-1872 it would be good to have them in a different module. Maybe we should just go ahead and start in a module so it is easier to modify in the future.
          Hide
          Jan Høydahl added a comment -

          Great momentum on this guys!!

          If the AccessTokenSecurityComponent is to be in core and provide a general Interface, I think it's beneficial to do at least one other implementation. How about a simple file-backed security provider?

          A great benefit with this is that later, we could easily add security fields to the example schema and exampledocs, add a security tab in the Velocity (/browse) GUI and voila! we have a demoable document level security feature right there with no other dependencies!

          It would also be valuable if Anders & Peter chimed in at this stage of the interface design with their "glasses" on.

          Show
          Jan Høydahl added a comment - Great momentum on this guys!! If the AccessTokenSecurityComponent is to be in core and provide a general Interface, I think it's beneficial to do at least one other implementation. How about a simple file-backed security provider? A great benefit with this is that later, we could easily add security fields to the example schema and exampledocs, add a security tab in the Velocity (/browse) GUI and voila! we have a demoable document level security feature right there with no other dependencies! It would also be valuable if Anders & Peter chimed in at this stage of the interface design with their "glasses" on.
          Hide
          Karl Wright added a comment -

          How about a simple file-backed security provider?

          Are you thinking of a plugin that somehow maps an incoming user name to a list of tokens via a file? Or are you thinking of a wholly different architecture, like a post-filter?

          I don't think the plug-in as designed would be appropriate for the post-filter-style security application. That's why Ryan alluded to SOLR-1834 above. A different search component will have to be written for that purpose, or we'll need multiple modes in one search component (which I tend to think is a bad idea).

          Show
          Karl Wright added a comment - How about a simple file-backed security provider? Are you thinking of a plugin that somehow maps an incoming user name to a list of tokens via a file? Or are you thinking of a wholly different architecture, like a post-filter? I don't think the plug-in as designed would be appropriate for the post-filter-style security application. That's why Ryan alluded to SOLR-1834 above. A different search component will have to be written for that purpose, or we'll need multiple modes in one search component (which I tend to think is a bad idea).
          Hide
          Yonik Seeley added a comment -

          Any reason this can't just be a custom QParserPlugin?
          It would then be enabled via something like fq=

          {!mfc_security}

          (could add be added from a front end system, or as an appends on a request handler so it can't be bypassed.) It may make more sense being maintained in MFC land rather than Solr land?

          Show
          Yonik Seeley added a comment - Any reason this can't just be a custom QParserPlugin? It would then be enabled via something like fq= {!mfc_security} (could add be added from a front end system, or as an appends on a request handler so it can't be bypassed.) It may make more sense being maintained in MFC land rather than Solr land?
          Hide
          Karl Wright added a comment -

          It may make more sense being maintained in MFC land rather than Solr land?

          I don't think so, since then ManifoldCF would have a build dependency on Solr, which would mean either various parts get built independently, or each release of ManifoldCF would be certified against a specific release of Solr.

          If this is not going to go into Solr, where I believe it naturally belongs, then I think we might as well just to keep it checked in under googlecode as part of the ManifoldCF in Action book example. The users will have to figure it all out.

          Show
          Karl Wright added a comment - It may make more sense being maintained in MFC land rather than Solr land? I don't think so, since then ManifoldCF would have a build dependency on Solr, which would mean either various parts get built independently, or each release of ManifoldCF would be certified against a specific release of Solr. If this is not going to go into Solr, where I believe it naturally belongs, then I think we might as well just to keep it checked in under googlecode as part of the ManifoldCF in Action book example. The users will have to figure it all out.
          Hide
          Karl Wright added a comment -

          Any reason this can't just be a custom QParserPlugin?

          I thought of that, but then the security functionality would become tied to the query parser used. There's no guarantee at all that the standard query parser would be used in all interesting cases - in fact, I've already encountered several where it wouldn't be.

          Show
          Karl Wright added a comment - Any reason this can't just be a custom QParserPlugin? I thought of that, but then the security functionality would become tied to the query parser used. There's no guarantee at all that the standard query parser would be used in all interesting cases - in fact, I've already encountered several where it wouldn't be.
          Hide
          Yonik Seeley added a comment -

          I thought of that, but then the security functionality would become tied to the query parser used.

          It would work with any query parser.
          Example: q=

          {!dismax}

          hello world&fq=

          {!mfc_security}

          The filter types and query types are completely independent.

          Show
          Yonik Seeley added a comment - I thought of that, but then the security functionality would become tied to the query parser used. It would work with any query parser. Example: q= {!dismax} hello world&fq= {!mfc_security} The filter types and query types are completely independent.
          Hide
          Karl Wright added a comment - - edited

          It would work with any query parser.

          Sorry, I used the wrong term. I meant that security functionality would become tied to the standard search handler.
          Are there any conceivable cases where there might be a filter query in place already? In those cases is there facility for handling more than one filter query parser at a time?

          Show
          Karl Wright added a comment - - edited It would work with any query parser. Sorry, I used the wrong term. I meant that security functionality would become tied to the standard search handler. Are there any conceivable cases where there might be a filter query in place already? In those cases is there facility for handling more than one filter query parser at a time?
          Hide
          Jan Høydahl added a comment -

          How about a simple file-backed security provider?

          Are you thinking of a plugin that somehow maps an incoming user name to a list of tokens via a file? Or are you thinking of a wholly different architecture, like a post-filter?

          I'm thinking early binding. Simply an example of another AccessTokenService, e.g. o.a.s.handler.auth.SimpleFileAccessTokenService. It would read an XML file where you map statically the user's tokens. It would be a "simplest thing that could possibly work" approach to demo security in Solr, using the real APIs.

          <userTokens name="jan" passHash="secret">A B C</userTokens>
          <userTokens name="karl" passHash="secret">A C</userTokens>
          
          Show
          Jan Høydahl added a comment - How about a simple file-backed security provider? Are you thinking of a plugin that somehow maps an incoming user name to a list of tokens via a file? Or are you thinking of a wholly different architecture, like a post-filter? I'm thinking early binding. Simply an example of another AccessTokenService, e.g. o.a.s.handler.auth.SimpleFileAccessTokenService. It would read an XML file where you map statically the user's tokens. It would be a "simplest thing that could possibly work" approach to demo security in Solr, using the real APIs. <userTokens name= "jan" passHash= "secret" > A B C </userTokens> <userTokens name= "karl" passHash= "secret" > A C </userTokens>
          Hide
          Ryan McKinley added a comment -

          Thinking about Component vs QueryParserPlugin.... a QueryParserPlugin could work, but i'm not sure its the best match. Essentially we want to be able to add an arbitrary filter query to the request based on some notion of the user. this may be from parameters, it may be from the context. In distributed search, you don't apply the filters when it is a shard.

          It could be written with a QueryParserPlugin, but we don't need any of the query parsing bits. Also it seems weird to have to add a MatchAllDocuments query in the shard case.

          It may make more sense being maintained in MFC land rather than Solr land?

          Possibly – but i think it depends how general things are. It seems reasonable to have the basic building blocks for building security in solr – this is a pretty common request!

          Show
          Ryan McKinley added a comment - Thinking about Component vs QueryParserPlugin.... a QueryParserPlugin could work, but i'm not sure its the best match. Essentially we want to be able to add an arbitrary filter query to the request based on some notion of the user. this may be from parameters, it may be from the context. In distributed search, you don't apply the filters when it is a shard. It could be written with a QueryParserPlugin, but we don't need any of the query parsing bits. Also it seems weird to have to add a MatchAllDocuments query in the shard case. It may make more sense being maintained in MFC land rather than Solr land? Possibly – but i think it depends how general things are. It seems reasonable to have the basic building blocks for building security in solr – this is a pretty common request!
          Hide
          Ryan McKinley added a comment -

          @Jan, thinking about SimpleFileAccessTokenService, I agree except I don't think there should be a passHash – we would not authenticate anything, just pass the tokens if the request says they are "jan"

          Can tokens have spaces? If so, perhaps somethign like:
          <user name="jan"><token>A</token><token>B</token>...

          Show
          Ryan McKinley added a comment - @Jan, thinking about SimpleFileAccessTokenService, I agree except I don't think there should be a passHash – we would not authenticate anything, just pass the tokens if the request says they are "jan" Can tokens have spaces? If so, perhaps somethign like: <user name="jan"><token>A</token><token>B</token>...
          Hide
          Erik Hatcher added a comment - - edited

          Any reason this can't just be a custom QParserPlugin?

          ummm, duh! yes, that's the way to do it, IMO. I should have thought of that right off the bat, but I got blinded by the existing patch and was just keying off how it was implemented as a SearchComponent.

          Are there any conceivable cases where there might be a filter query in place already? In those cases is there facility for handling more than one filter query parser at a time?

          I don't understand... any number of filter queries can be specified, each using their own unique query parser, taking their own (local) parameters as necessary, or leveraging global ones if that makes sense, or using globals that are overridden by locals.

          So I think this boils down to MCF having a custom query parser in it's codebase (yes, it'll have to depend on Solr, but it already does in terms of being able to index into Solr).

          And, there is already a mechanism to bake that type of filtering into any request handler so the client isn't necessarily responsible for setting it - using the an "appends" section in the request handler definition to specify something like this:

          <lst name="appends">
            <str name="fq">{!mcf_security ... [optional local params]}[... optional "q" ...]</str>
          </lst>
          

          Any holes in doing it this way? Seems the cleanest/slickest way to me currently.

          Show
          Erik Hatcher added a comment - - edited Any reason this can't just be a custom QParserPlugin? ummm, duh! yes, that's the way to do it, IMO. I should have thought of that right off the bat, but I got blinded by the existing patch and was just keying off how it was implemented as a SearchComponent. Are there any conceivable cases where there might be a filter query in place already? In those cases is there facility for handling more than one filter query parser at a time? I don't understand... any number of filter queries can be specified, each using their own unique query parser, taking their own (local) parameters as necessary, or leveraging global ones if that makes sense, or using globals that are overridden by locals. So I think this boils down to MCF having a custom query parser in it's codebase (yes, it'll have to depend on Solr, but it already does in terms of being able to index into Solr). And, there is already a mechanism to bake that type of filtering into any request handler so the client isn't necessarily responsible for setting it - using the an "appends" section in the request handler definition to specify something like this: <lst name= "appends" > <str name= "fq" >{!mcf_security ... [optional local params]}[... optional "q" ...]</str> </lst> Any holes in doing it this way? Seems the cleanest/slickest way to me currently.
          Hide
          Karl Wright added a comment - - edited

          So I think this boils down to MCF having a custom query parser in it's codebase (yes, it'll have to depend on Solr, but it already does in terms of being able to index into Solr).

          There's currently no such dependency, but it looks like there will be shortly. (Indexing is handled via http, so there are no Solr requirements there.)

          Any holes in doing it this way? Seems the cleanest/slickest way to me currently.

          Can you give me an example of multiple filter queries being used? For example, suppose an fq argument comes into the Search Handler - how does the "appends" do the right thing? I'd like this to be transparent to other query parsers that may be in use, so I want to verify that there will be no impact in silently adding another one to the chain.

          Show
          Karl Wright added a comment - - edited So I think this boils down to MCF having a custom query parser in it's codebase (yes, it'll have to depend on Solr, but it already does in terms of being able to index into Solr). There's currently no such dependency, but it looks like there will be shortly. (Indexing is handled via http, so there are no Solr requirements there.) Any holes in doing it this way? Seems the cleanest/slickest way to me currently. Can you give me an example of multiple filter queries being used? For example, suppose an fq argument comes into the Search Handler - how does the "appends" do the right thing? I'd like this to be transparent to other query parsers that may be in use, so I want to verify that there will be no impact in silently adding another one to the chain.
          Hide
          Ryan McKinley added a comment -

          Can you give me an example of multiple filter queries being used?

          Assume the user query was:

           &q=hello&fq=type:big&fq={!dismax}hello world
          

          and the handler had appends configured as:

          <lst name="appends">
            <str name="fq">{!mcf_security}</str>
          </lst>
          

          the handler would behave as if the input were actually:

           &q=hello&fq=type:big&fq={!dismax}hello world&{!mcf_security}
          

          The only hole i can point to is that i'm not sure what happens if it is specified in both places. I'm also not 100% sure on the shard case

          I'm confident a QParser would work, but i don't see any real advantage to it over a SearchComponent. The purpose of a QueryParser is to parse the query... but this does not require any parsing.


          I think the bigger question is do we want any security scaffolding in solr, or is this something that should always be delegated elsewhere. If there is strong resistance to including a general security model, we should make that clear and not waste more time sorting out the details.

          The core of this path is an allow/deny matrix to lucene Query; this is applicable to many security strategies not just manifold. My hope with introducing the AccessTokenService is to separate the user-to-token mapping from how the lucene

          Show
          Ryan McKinley added a comment - Can you give me an example of multiple filter queries being used? Assume the user query was: &q=hello&fq=type:big&fq={!dismax}hello world and the handler had appends configured as: <lst name= "appends" > <str name= "fq" >{!mcf_security}</str> </lst> the handler would behave as if the input were actually: &q=hello&fq=type:big&fq={!dismax}hello world&{!mcf_security} The only hole i can point to is that i'm not sure what happens if it is specified in both places. I'm also not 100% sure on the shard case I'm confident a QParser would work, but i don't see any real advantage to it over a SearchComponent. The purpose of a QueryParser is to parse the query... but this does not require any parsing. I think the bigger question is do we want any security scaffolding in solr, or is this something that should always be delegated elsewhere. If there is strong resistance to including a general security model, we should make that clear and not waste more time sorting out the details. The core of this path is an allow/deny matrix to lucene Query; this is applicable to many security strategies not just manifold. My hope with introducing the AccessTokenService is to separate the user-to-token mapping from how the lucene
          Hide
          Robert Muir added a comment -

          i agree with the bigger issue statement, in cases like this it might be good to separate this discussion of the general issue from this specific patch by raising a new ML thread.

          Show
          Robert Muir added a comment - i agree with the bigger issue statement, in cases like this it might be good to separate this discussion of the general issue from this specific patch by raising a new ML thread.
          Hide
          Karl Wright added a comment - - edited

          The core of this path is an allow/deny matrix to lucene Query; this is applicable to many security strategies not just manifold. My hope with introducing the AccessTokenService is to separate the user-to-token mapping

          I agree - there should be a unified framework to the degree feasible. This would allow common testing and reasonable maintenance across Lucene and Solr versions for the future.

          For ManifoldCF, there's also an unrelated release-engineering question, specifically for the ManifoldCF-specific portion of the proposal. I don't understand why we'd believe that introducing a code dependency on something like Solr/Lucene would be a good idea, especially since we'd be building a jar specifically for deployment within Solr. We do this reluctantly for a couple of other connectors but it's a complete one-of each time and always requires a great deal of work by end users. This inconvenience greatly impacts the level of deployment of the affected connectors. Since Solr is Apache licensed we could make this easier in Solr's case, but probably not without redistributing a specific version of Solr and Lucene, and providing build targets which fire up an already configured Solr/Lucene instance. We would need this also for testing, if the plugin code lived in ManifoldCF. It is also the case that the current ManifoldCF search component needed significant rework even to build between version Lucene/Solr 3.x and version 4.x, because many of the classes that were used changed their packages. Thus we'd likely need to redistribute more than one Solr/Lucene instance at a time, and release perhaps twice as frequently as we currently do just to keep up with the Solr/Lucene release schedule.

          Given all that, does everyone still think it is desirable for ManifoldCF to build Solr components itself? The alternative would be a Solr contrib module, which I'd be very happy with. To me, it is the obvious choice if you want a straightforward overall user experience. The underlying http-based protocol that the component will need to use is well-defined, quite complete, and is unlikely to change. The required dependencies (commons-httpclient) are already redistributed by Solr, so that shouldn't be a problem either.

          Show
          Karl Wright added a comment - - edited The core of this path is an allow/deny matrix to lucene Query; this is applicable to many security strategies not just manifold. My hope with introducing the AccessTokenService is to separate the user-to-token mapping I agree - there should be a unified framework to the degree feasible. This would allow common testing and reasonable maintenance across Lucene and Solr versions for the future. For ManifoldCF, there's also an unrelated release-engineering question, specifically for the ManifoldCF-specific portion of the proposal. I don't understand why we'd believe that introducing a code dependency on something like Solr/Lucene would be a good idea, especially since we'd be building a jar specifically for deployment within Solr. We do this reluctantly for a couple of other connectors but it's a complete one-of each time and always requires a great deal of work by end users. This inconvenience greatly impacts the level of deployment of the affected connectors. Since Solr is Apache licensed we could make this easier in Solr's case, but probably not without redistributing a specific version of Solr and Lucene, and providing build targets which fire up an already configured Solr/Lucene instance. We would need this also for testing, if the plugin code lived in ManifoldCF. It is also the case that the current ManifoldCF search component needed significant rework even to build between version Lucene/Solr 3.x and version 4.x, because many of the classes that were used changed their packages. Thus we'd likely need to redistribute more than one Solr/Lucene instance at a time, and release perhaps twice as frequently as we currently do just to keep up with the Solr/Lucene release schedule. Given all that, does everyone still think it is desirable for ManifoldCF to build Solr components itself? The alternative would be a Solr contrib module, which I'd be very happy with. To me, it is the obvious choice if you want a straightforward overall user experience. The underlying http-based protocol that the component will need to use is well-defined, quite complete, and is unlikely to change. The required dependencies (commons-httpclient) are already redistributed by Solr, so that shouldn't be a problem either.
          Hide
          Erik Hatcher added a comment -

          The purpose of a QueryParser is to parse the query... but this does not require any parsing.

          Ryan - how about the term query parser? While not strictly taking a free form query string and "parsing" it into a Query, the general QParserPlugin is about being a Query "factory" taking whatever inputs it needs to construct that; "parser" is a bit of a misnomer with what the abstraction really defines. [I didn't understand the comment about MatchAllDocsQuery earlier either, as that doesn't seem necessary here]

          I think the bigger question is do we want any security scaffolding in solr, or is this something that should always be delegated elsewhere

          In this case, it really boils down to generating a handful of wildcard queries, it looks like, but in an MCF-specific way. I'm not sure this is, yet, a pressing need to generalize a security framework within Solr, as it's just a Query generator.

          Regarding the location of this capability - a Solr contrib works for me. It's tricky business deciding where to put glue code between two projects (e.g. MCF contains a Solr indexer, using this same logic, though, why shouldn't it also be in a Solr contrib/mcf too?). Perhaps the real deciding factor is a practical choice of where the maintainers of this best can work on it - and in this case it'd be MCF so that that community can maintain it directly rather than through JIRA patches and committers that aren't using MCF. But again though, in this case I'm fine with it living in Solr contrib/mcf.

          Show
          Erik Hatcher added a comment - The purpose of a QueryParser is to parse the query... but this does not require any parsing. Ryan - how about the term query parser? While not strictly taking a free form query string and "parsing" it into a Query, the general QParserPlugin is about being a Query "factory" taking whatever inputs it needs to construct that; "parser" is a bit of a misnomer with what the abstraction really defines. [I didn't understand the comment about MatchAllDocsQuery earlier either, as that doesn't seem necessary here] I think the bigger question is do we want any security scaffolding in solr, or is this something that should always be delegated elsewhere In this case, it really boils down to generating a handful of wildcard queries, it looks like, but in an MCF-specific way. I'm not sure this is, yet, a pressing need to generalize a security framework within Solr, as it's just a Query generator. Regarding the location of this capability - a Solr contrib works for me. It's tricky business deciding where to put glue code between two projects (e.g. MCF contains a Solr indexer, using this same logic, though, why shouldn't it also be in a Solr contrib/mcf too?). Perhaps the real deciding factor is a practical choice of where the maintainers of this best can work on it - and in this case it'd be MCF so that that community can maintain it directly rather than through JIRA patches and committers that aren't using MCF. But again though, in this case I'm fine with it living in Solr contrib/mcf.
          Hide
          Jan Høydahl added a comment -

          I think the bigger question is do we want any security scaffolding in solr, or is this something that should always be delegated elsewhere

          In this case, it really boils down to generating a handful of wildcard queries, it looks like, but in an MCF-specific way. I'm not sure this is, yet, a pressing need to generalize a security framework within Solr, as it's just a Query generator.

          Both fq and SearchComponent would work for early binding, but when we want to extend the model with an (optional) late binding, i.e. filtering search results, fq won't cut it. A SearchComponent however can be extended not only to handle early+late binding but also any other strange requirements there may be regarding security, such as authentication by IP address, peeking at other parameters, modifying the request (or response) in some way etc. These would fit as plugins to the Security SearchComponent just as AccessTokenServices (for early-binding) are in current design.

          I'm +1 for starting to include some built-in framework support for security, else I think we'll start seeing a multitude of different ways to integrate security which is not a competitive advantage for Solr. A SC is itself only a plugin anyway so we don't enforce anything on people, but I think it makes a huge difference that it's a plugin which ships with Solr rather than each connector having its own not-up-to-date security mechanism floating around.

          In Real Life™ a deployment may include a mix of MCF and non-MCF connectors; in fact we have two customers in that situation already. The ideal would be to move everything to MCF but that might not be possible due to a custom or more fine-grained security model. Such a special case is also easier to handle with SC - I don't see how to add code to merge/unify two (possibly 3rd party) QParsers, except from creating a new umbrella one.

          We'll keep the "core" layer generic and thin. AccessTokenSecurityComponent and AccessTokenService (which should perhaps be an Interface instead) go in core, while ManifoldCFAccessTokenService and others may live wherever most convenient. I, for one, would be interested in maintaining some of these classes, and also adding a Velocity demo of it all.

          That was my +1 for SearchComponent

          @Ryan, that's true, we only need to be concerned with authenticated user, the Velocity demo tab could simulate the rest.

          Show
          Jan Høydahl added a comment - I think the bigger question is do we want any security scaffolding in solr, or is this something that should always be delegated elsewhere In this case, it really boils down to generating a handful of wildcard queries, it looks like, but in an MCF-specific way. I'm not sure this is, yet, a pressing need to generalize a security framework within Solr, as it's just a Query generator. Both fq and SearchComponent would work for early binding, but when we want to extend the model with an (optional) late binding, i.e. filtering search results, fq won't cut it. A SearchComponent however can be extended not only to handle early+late binding but also any other strange requirements there may be regarding security, such as authentication by IP address, peeking at other parameters, modifying the request (or response) in some way etc. These would fit as plugins to the Security SearchComponent just as AccessTokenServices (for early-binding) are in current design. I'm +1 for starting to include some built-in framework support for security, else I think we'll start seeing a multitude of different ways to integrate security which is not a competitive advantage for Solr. A SC is itself only a plugin anyway so we don't enforce anything on people, but I think it makes a huge difference that it's a plugin which ships with Solr rather than each connector having its own not-up-to-date security mechanism floating around. In Real Life™ a deployment may include a mix of MCF and non-MCF connectors; in fact we have two customers in that situation already. The ideal would be to move everything to MCF but that might not be possible due to a custom or more fine-grained security model. Such a special case is also easier to handle with SC - I don't see how to add code to merge/unify two (possibly 3rd party) QParsers, except from creating a new umbrella one. We'll keep the "core" layer generic and thin. AccessTokenSecurityComponent and AccessTokenService (which should perhaps be an Interface instead) go in core, while ManifoldCFAccessTokenService and others may live wherever most convenient. I, for one, would be interested in maintaining some of these classes, and also adding a Velocity demo of it all. That was my +1 for SearchComponent @Ryan, that's true, we only need to be concerned with authenticated user, the Velocity demo tab could simulate the rest.
          Hide
          Erik Hatcher added a comment -

          Both fq and SearchComponent would work for early binding, but when we want to extend the model with an (optional) late binding, i.e. filtering search results, fq won't cut it.

          Not true. There's now PostFilter to enable late binding. This might even be advantageous for this MCF filtering, as the WildcardQuery's could be expensive filters to generate and work best on the most constrained subset matching the rest of the traditional query and filters.

          A SearchComponent however can be extended not only to handle early+late binding but also any other strange requirements there may be regarding security, such as authentication by IP address, peeking at other parameters

          A QParserPlugin can see all the parameters a SearchComponent can see [createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req)]

          ...else I think we'll start seeing a multitude of different ways to integrate security which is not a competitive advantage for Solr

          If we cannot elaborate those different ways at this point, then building a "framework" is only asking for it to be changed later. In what scenarios would a security filter want to modify the response?

          I don't see how to add code to merge/unify two (possibly 3rd party) QParsers, except from creating a new umbrella one.

          nested queries.

          We'll keep the "core" layer generic and thin. AccessTokenSecurityComponent and AccessTokenService (which should perhaps be an Interface instead)

          I'm not sure that those abstractions are general enough. I still think a qparser is the simplest/cleanest thing that will work here and doesn't preclude or make harder any future needs. All of these other abstractions mentioned here are overkill, IMO, to what MCF needs - all it needs is a handful of aggregated WildcardQuery's.

          Show
          Erik Hatcher added a comment - Both fq and SearchComponent would work for early binding, but when we want to extend the model with an (optional) late binding, i.e. filtering search results, fq won't cut it. Not true. There's now PostFilter to enable late binding. This might even be advantageous for this MCF filtering, as the WildcardQuery's could be expensive filters to generate and work best on the most constrained subset matching the rest of the traditional query and filters. A SearchComponent however can be extended not only to handle early+late binding but also any other strange requirements there may be regarding security, such as authentication by IP address, peeking at other parameters A QParserPlugin can see all the parameters a SearchComponent can see [createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req)] ...else I think we'll start seeing a multitude of different ways to integrate security which is not a competitive advantage for Solr If we cannot elaborate those different ways at this point, then building a "framework" is only asking for it to be changed later. In what scenarios would a security filter want to modify the response? I don't see how to add code to merge/unify two (possibly 3rd party) QParsers, except from creating a new umbrella one. nested queries. We'll keep the "core" layer generic and thin. AccessTokenSecurityComponent and AccessTokenService (which should perhaps be an Interface instead) I'm not sure that those abstractions are general enough. I still think a qparser is the simplest/cleanest thing that will work here and doesn't preclude or make harder any future needs. All of these other abstractions mentioned here are overkill, IMO, to what MCF needs - all it needs is a handful of aggregated WildcardQuery's.
          Hide
          Jan Høydahl added a comment -

          @Erik:
          I think we mean different things with "late binding". By "late binding", I do not think of PostFilter in Lucene, but rather the technique of verifying for each hit that the logged-in user has access to see it before showing it. This fixes the issues of the search index being out of sync with the live ACLs in the source systems during some time window after an ACL change. Combining early and late binding provides best-in-class security. Most customers don't need it but the most demanding do. See p14+ in http://www.e2conf.com/archive/presentations/downloads/FO45_Bennett.pdf for more.

          A QParserPlugin can see all the parameters a SearchComponent can see [createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req)]

          Ok, did not realize that. But I would not expect a qParser to mess with other parts of the request than the query or filter that it is applied to, even if it technically could.

          ...else I think we'll start seeing a multitude of different ways to integrate security which is not a competitive advantage for Solr

          If we cannot elaborate those different ways at this point, then building a "framework" is only asking for it to be changed later. In what scenarios would a security filter want to modify the response?

          Isn't the reason we're seeing multiple ways of attempting to integrate security with Solr, a lack of guidance? This component doesn't need to fix all possible cases in first shot, but a v1.0 that works with MCF will surely be enough for many other common token-based systems, and we'll see some of those creating TokenServices of their own. I'll probably do a simple file-based one for demo purposes.

          In what scenarios would a security filter want to modify the response?

          Late binding is the most obvious example - removing hits that you are no longer entitled to see, because of latency between source system (such as a crawler) and the index. If e.g. all 10 hits on first page are suddenly no loger allowed, we'll need to re-query until we fill the requested number of rows, and modify hit counts in the response accordingly.

          All of these other abstractions mentioned here are overkill, IMO, to what MCF needs...

          MCF may not need more right now. But other connector frameworks will then have a standard place to integrate doc-level security with Solr. Common code, like how and where to pick up the authenticated userId or user from mod_authz_annotate stays in the SearchComponent. So no matter which connectors you use with Solr, the way of passing authenticated user to Solr does not change.

          Show
          Jan Høydahl added a comment - @Erik: I think we mean different things with "late binding". By "late binding", I do not think of PostFilter in Lucene, but rather the technique of verifying for each hit that the logged-in user has access to see it before showing it. This fixes the issues of the search index being out of sync with the live ACLs in the source systems during some time window after an ACL change. Combining early and late binding provides best-in-class security. Most customers don't need it but the most demanding do. See p14+ in http://www.e2conf.com/archive/presentations/downloads/FO45_Bennett.pdf for more. A QParserPlugin can see all the parameters a SearchComponent can see [createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req)] Ok, did not realize that. But I would not expect a qParser to mess with other parts of the request than the query or filter that it is applied to, even if it technically could. ...else I think we'll start seeing a multitude of different ways to integrate security which is not a competitive advantage for Solr If we cannot elaborate those different ways at this point, then building a "framework" is only asking for it to be changed later. In what scenarios would a security filter want to modify the response? Isn't the reason we're seeing multiple ways of attempting to integrate security with Solr, a lack of guidance? This component doesn't need to fix all possible cases in first shot, but a v1.0 that works with MCF will surely be enough for many other common token-based systems, and we'll see some of those creating TokenServices of their own. I'll probably do a simple file-based one for demo purposes. In what scenarios would a security filter want to modify the response? Late binding is the most obvious example - removing hits that you are no longer entitled to see, because of latency between source system (such as a crawler) and the index. If e.g. all 10 hits on first page are suddenly no loger allowed, we'll need to re-query until we fill the requested number of rows, and modify hit counts in the response accordingly. All of these other abstractions mentioned here are overkill, IMO, to what MCF needs... MCF may not need more right now. But other connector frameworks will then have a standard place to integrate doc-level security with Solr. Common code, like how and where to pick up the authenticated userId or user from mod_authz_annotate stays in the SearchComponent. So no matter which connectors you use with Solr, the way of passing authenticated user to Solr does not change.
          Hide
          Karl Wright added a comment -

          I intend to submit a new patch which is structured as a query parser shortly, intended to reside in contrib/mcf. It likely won't be today however.

          Show
          Karl Wright added a comment - I intend to submit a new patch which is structured as a query parser shortly, intended to reside in contrib/mcf. It likely won't be today however.
          Hide
          Karl Wright added a comment -

          +1 for a common framework. I'm not wedded to a search component. It is also possible to build a security query parser, of course, that has the same kind of abstraction that Jan proposes. Either one works for me so long as we don't wind up in a situation where we have to pick EITHER security OR some other thing that the user wants. Erik's explanation has relieved some of my concerns in this area but not all.

          For the moment I'm going to go ahead and write a query parser that specific to MCF and then we'll see if there are unforeseen issues. I'll keep both the query parser and the search component around for experimentation.

          Show
          Karl Wright added a comment - +1 for a common framework. I'm not wedded to a search component. It is also possible to build a security query parser, of course, that has the same kind of abstraction that Jan proposes. Either one works for me so long as we don't wind up in a situation where we have to pick EITHER security OR some other thing that the user wants. Erik's explanation has relieved some of my concerns in this area but not all. For the moment I'm going to go ahead and write a query parser that specific to MCF and then we'll see if there are unforeseen issues. I'll keep both the query parser and the search component around for experimentation.
          Hide
          Erik Hatcher added a comment -

          I think we mean different things with "late binding". By "late binding", I do not think of PostFilter in Lucene, but rather the technique of verifying for each hit that the logged-in user has access to see it before showing it. This fixes the issues of the search index being out of sync with the live ACLs in the source systems during some time window after an ACL change.

          We mean the same thing here. That's exactly Solr's (not Lucene's) PostFilter interface that was added in 3.4. See http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters

          If e.g. all 10 hits on first page are suddenly no longer allowed, we'll need to re-query until we fill the requested number of rows, and modify hit counts in the response accordingly.

          Yikes! No, that's a nightmare with faceting and so on. You need to filter inline with the main query so that every component afterwards has the proper document set. Again, PostFilter was built for this very scenario.

          Show
          Erik Hatcher added a comment - I think we mean different things with "late binding". By "late binding", I do not think of PostFilter in Lucene, but rather the technique of verifying for each hit that the logged-in user has access to see it before showing it. This fixes the issues of the search index being out of sync with the live ACLs in the source systems during some time window after an ACL change. We mean the same thing here. That's exactly Solr's (not Lucene's) PostFilter interface that was added in 3.4. See http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters If e.g. all 10 hits on first page are suddenly no longer allowed, we'll need to re-query until we fill the requested number of rows, and modify hit counts in the response accordingly. Yikes! No, that's a nightmare with faceting and so on. You need to filter inline with the main query so that every component afterwards has the proper document set. Again, PostFilter was built for this very scenario.
          Hide
          Erik Hatcher added a comment -

          But I would not expect a qParser to mess with other parts of the request than the query or filter that it is applied to, even if it technically could.

          Well, a qparser shouldn't mess with other parameters, but it certainly can (and does!) use them. Think dismax and qf/pf/bq/etc. It's not just about q. In fact, you don't even need q in dismax (which then uses q.alt)

          Show
          Erik Hatcher added a comment - But I would not expect a qParser to mess with other parts of the request than the query or filter that it is applied to, even if it technically could. Well, a qparser shouldn't mess with other parameters, but it certainly can (and does!) use them. Think dismax and qf/pf/bq/etc. It's not just about q. In fact, you don't even need q in dismax (which then uses q.alt)
          Hide
          Jan Høydahl added a comment -

          Thanks Erik for explaining about Solr's post filter and DelegatingCollector. From JavaDoc: "This collector interface also enables better performance when an external system must be consulted, since document ids may be buffered and batched into a single request to the external system."

          Yikes! No, that's a nightmare with faceting and so on. You need to filter inline with the main query so that every component afterwards has the proper document set. Again, PostFilter was built for this very scenario.

          Yes, late binding is expensive. That's why it's normally done only on top-N docs right before displaying, sacrificing 100% correct facet counts, but combining early and late binding makes this into a narrow corner case.

          Does the DelegatingCollector require filtering of the full result set or can it do top-X? Consulting a Authority Service (MCF or other) for all IDs in the result set is sub-optimal, even if batched, and a high price to pay for exact facet counts.

          Show
          Jan Høydahl added a comment - Thanks Erik for explaining about Solr's post filter and DelegatingCollector. From JavaDoc: "This collector interface also enables better performance when an external system must be consulted, since document ids may be buffered and batched into a single request to the external system." Yikes! No, that's a nightmare with faceting and so on. You need to filter inline with the main query so that every component afterwards has the proper document set. Again, PostFilter was built for this very scenario. Yes, late binding is expensive. That's why it's normally done only on top-N docs right before displaying, sacrificing 100% correct facet counts, but combining early and late binding makes this into a narrow corner case. Does the DelegatingCollector require filtering of the full result set or can it do top-X? Consulting a Authority Service (MCF or other) for all IDs in the result set is sub-optimal, even if batched, and a high price to pay for exact facet counts.
          Hide
          Jan Høydahl added a comment -

          Of course, we don't need to solve late binding or other future features or frameworks right now for the MCF case. I see that both SC and Qparser methods is capable of MCFs current needs, so the remaining questions are: a) Do we want to make security a first class concept in Solr now? and b) Does the generic code which would be necessary for any security filtering, not only MCF, justifiy a "framework" component at this stage.

          Show
          Jan Høydahl added a comment - Of course, we don't need to solve late binding or other future features or frameworks right now for the MCF case. I see that both SC and Qparser methods is capable of MCFs current needs, so the remaining questions are: a) Do we want to make security a first class concept in Solr now? and b) Does the generic code which would be necessary for any security filtering, not only MCF, justifiy a "framework" component at this stage.
          Hide
          Karl Wright added a comment -

          While you folks debate, I've been coding, but I am running into a strange difficulty. A one-to-one conversion of filters to queries, throughout the code, without any other mods, results in a test failure, because no results are ever returned. This is probably some error I made but I haven't found one yet. I'll keep looking.

          Also, to Erik's point about WildcardQuery's, the only point of these is to identify documents that have no tokens of a type, that is, documents that have no allow-document and deny-document tokens. As the comment says, since this is a "must-not" boolean clause I had heard that there were optimizations for this case. If not, MCF can chuck in a dummy token when it indexes documents like this to improve performance. I'd love to know if that's necessary or not, in your esteemed opinions.

          Show
          Karl Wright added a comment - While you folks debate, I've been coding, but I am running into a strange difficulty. A one-to-one conversion of filters to queries, throughout the code, without any other mods, results in a test failure, because no results are ever returned. This is probably some error I made but I haven't found one yet. I'll keep looking. Also, to Erik's point about WildcardQuery's, the only point of these is to identify documents that have no tokens of a type, that is, documents that have no allow-document and deny-document tokens. As the comment says, since this is a "must-not" boolean clause I had heard that there were optimizations for this case. If not, MCF can chuck in a dummy token when it indexes documents like this to improve performance. I'd love to know if that's necessary or not, in your esteemed opinions.
          Hide
          Karl Wright added a comment - - edited

          Here's the diff, which looks perfectly fine to me. If anybody knows why this shouldn't work, please let me know. The first incarnation of the security filter used queries, and that was fine, but that was a year ago now.

          Index: src/java/org/apache/solr/mcf/ManifoldCFSecurityFilter.java
          ===================================================================
          --- src/java/org/apache/solr/mcf/ManifoldCFSecurityFilter.java	(revision 1173895)
          +++ src/java/org/apache/solr/mcf/ManifoldCFSecurityFilter.java	(working copy)
          @@ -150,7 +150,8 @@
                 userAccessTokens = getAccessTokens(authenticatedUserName);
               }
           
          -    BooleanFilter bf = new BooleanFilter();
          +    BooleanQuery bq = new BooleanQuery();
          +    //bf.setMaxClauseCount(100000);
               
               if (userAccessTokens.size() == 0)
               {
          @@ -159,28 +160,26 @@
                 // (fieldAllowShare is empty AND fieldDenyShare is empty AND fieldAllowDocument is empty AND fieldDenyDocument is empty)
                 // We're trying to map to:  -(fieldAllowShare:*) , which should be pretty efficient in Solr because it is negated.  If this turns out not to be so, then we should
                 // have the SolrConnector inject a special token into these fields when they otherwise would be empty, and we can trivially match on that token.
          -      bf.add(new FilterClause(new QueryWrapperFilter(new WildcardQuery(new Term(fieldAllowShare,"*"))),BooleanClause.Occur.MUST_NOT));
          -      bf.add(new FilterClause(new QueryWrapperFilter(new WildcardQuery(new Term(fieldDenyShare,"*"))),BooleanClause.Occur.MUST_NOT));
          -      bf.add(new FilterClause(new QueryWrapperFilter(new WildcardQuery(new Term(fieldAllowDocument,"*"))),BooleanClause.Occur.MUST_NOT));
          -      bf.add(new FilterClause(new QueryWrapperFilter(new WildcardQuery(new Term(fieldDenyDocument,"*"))),BooleanClause.Occur.MUST_NOT));
          +      bq.add(new WildcardQuery(new Term(fieldAllowShare,"*")),BooleanClause.Occur.MUST_NOT);
          +      bq.add(new WildcardQuery(new Term(fieldDenyShare,"*")),BooleanClause.Occur.MUST_NOT);
          +      bq.add(new WildcardQuery(new Term(fieldAllowDocument,"*")),BooleanClause.Occur.MUST_NOT);
          +      bq.add(new WildcardQuery(new Term(fieldDenyDocument,"*")),BooleanClause.Occur.MUST_NOT);
               }
               else
               {
                 // Extend the query appropriately for each user access token.
          -      bf.add(new FilterClause(calculateCompleteSubfilter(fieldAllowShare,fieldDenyShare,userAccessTokens),BooleanClause.Occur.MUST));
          -      bf.add(new FilterClause(calculateCompleteSubfilter(fieldAllowDocument,fieldDenyDocument,userAccessTokens),BooleanClause.Occur.MUST));
          +      bq.add(calculateCompleteSubquery(fieldAllowShare,fieldDenyShare,userAccessTokens),BooleanClause.Occur.MUST);
          +      bq.add(calculateCompleteSubquery(fieldAllowDocument,fieldDenyDocument,userAccessTokens),BooleanClause.Occur.MUST);
               }
           
               // Concatenate with the user's original query.
          -    //FilteredQuery query = new FilteredQuery(rb.getQuery(),bf);
          -    //rb.setQuery(query);
               List<Query> list = rb.getFilters();
               if (list == null)
               {
                 list = new ArrayList<Query>();
                 rb.setFilters(list);
               }
          -    list.add(new ConstantScoreQuery(bf));
          +    list.add(new ConstantScoreQuery(bq));
             }
           
             @Override
          @@ -193,28 +192,27 @@
             * ((fieldAllowShare is empty AND fieldDenyShare is empty) OR fieldAllowShare HAS token1 OR fieldAllowShare HAS token2 ...)
             *     AND fieldDenyShare DOESN'T_HAVE token1 AND fieldDenyShare DOESN'T_HAVE token2 ...
             */
          -  protected Filter calculateCompleteSubfilter(String allowField, String denyField, List<String> userAccessTokens)
          +  protected Query calculateCompleteSubquery(String allowField, String denyField, List<String> userAccessTokens)
             {
          -    BooleanFilter bf = new BooleanFilter();
          +    BooleanQuery bq = new BooleanQuery();
          +    bq.setMaxClauseCount(1000000);
               
               // Add a clause for each token.  This will be added directly to the main filter (as a deny test), as well as to an OR's subclause (as an allow test).
          -    BooleanFilter orFilter = new BooleanFilter();
          +    BooleanQuery orQuery = new BooleanQuery();
          +    orQuery.setMaxClauseCount(1000000);
          +
               // Add the empty-acl case
          -    BooleanFilter subUnprotectedClause = new BooleanFilter();
          -    subUnprotectedClause.add(new FilterClause(new QueryWrapperFilter(new WildcardQuery(new Term(allowField,"*"))),BooleanClause.Occur.MUST_NOT));
          -    subUnprotectedClause.add(new FilterClause(new QueryWrapperFilter(new WildcardQuery(new Term(denyField,"*"))),BooleanClause.Occur.MUST_NOT));
          -    orFilter.add(new FilterClause(subUnprotectedClause,BooleanClause.Occur.SHOULD));
          +    BooleanQuery subUnprotectedClause = new BooleanQuery();
          +    subUnprotectedClause.add(new WildcardQuery(new Term(allowField,"*")),BooleanClause.Occur.MUST_NOT);
          +    subUnprotectedClause.add(new WildcardQuery(new Term(denyField,"*")),BooleanClause.Occur.MUST_NOT);
          +    orQuery.add(subUnprotectedClause,BooleanClause.Occur.SHOULD);
               for (String accessToken : userAccessTokens)
               {
          -      TermsFilter tf = new TermsFilter();
          -      tf.addTerm(new Term(allowField,accessToken));
          -      orFilter.add(new FilterClause(tf,BooleanClause.Occur.SHOULD));
          -      tf = new TermsFilter();
          -      tf.addTerm(new Term(denyField,accessToken));
          -      bf.add(new FilterClause(tf,BooleanClause.Occur.MUST_NOT));
          +      orQuery.add(new TermQuery(new Term(allowField,accessToken)),BooleanClause.Occur.SHOULD);
          +      bq.add(new TermQuery(new Term(denyField,accessToken)),BooleanClause.Occur.MUST_NOT);
               }
          -    bf.add(new FilterClause(orFilter,BooleanClause.Occur.MUST));
          -    return bf;
          +    bq.add(orQuery,BooleanClause.Occur.MUST);
          +    return bq;
             }
             
             //---------------------------------------------------------------------------------
          
          Show
          Karl Wright added a comment - - edited Here's the diff, which looks perfectly fine to me. If anybody knows why this shouldn't work, please let me know. The first incarnation of the security filter used queries, and that was fine, but that was a year ago now. Index: src/java/org/apache/solr/mcf/ManifoldCFSecurityFilter.java =================================================================== --- src/java/org/apache/solr/mcf/ManifoldCFSecurityFilter.java (revision 1173895) +++ src/java/org/apache/solr/mcf/ManifoldCFSecurityFilter.java (working copy) @@ -150,7 +150,8 @@ userAccessTokens = getAccessTokens(authenticatedUserName); } - BooleanFilter bf = new BooleanFilter(); + BooleanQuery bq = new BooleanQuery(); + //bf.setMaxClauseCount(100000); if (userAccessTokens.size() == 0) { @@ -159,28 +160,26 @@ // (fieldAllowShare is empty AND fieldDenyShare is empty AND fieldAllowDocument is empty AND fieldDenyDocument is empty) // We're trying to map to: -(fieldAllowShare:*) , which should be pretty efficient in Solr because it is negated. If this turns out not to be so, then we should // have the SolrConnector inject a special token into these fields when they otherwise would be empty, and we can trivially match on that token. - bf.add( new FilterClause( new QueryWrapperFilter( new WildcardQuery( new Term(fieldAllowShare, "*" ))),BooleanClause.Occur.MUST_NOT)); - bf.add( new FilterClause( new QueryWrapperFilter( new WildcardQuery( new Term(fieldDenyShare, "*" ))),BooleanClause.Occur.MUST_NOT)); - bf.add( new FilterClause( new QueryWrapperFilter( new WildcardQuery( new Term(fieldAllowDocument, "*" ))),BooleanClause.Occur.MUST_NOT)); - bf.add( new FilterClause( new QueryWrapperFilter( new WildcardQuery( new Term(fieldDenyDocument, "*" ))),BooleanClause.Occur.MUST_NOT)); + bq.add( new WildcardQuery( new Term(fieldAllowShare, "*" )),BooleanClause.Occur.MUST_NOT); + bq.add( new WildcardQuery( new Term(fieldDenyShare, "*" )),BooleanClause.Occur.MUST_NOT); + bq.add( new WildcardQuery( new Term(fieldAllowDocument, "*" )),BooleanClause.Occur.MUST_NOT); + bq.add( new WildcardQuery( new Term(fieldDenyDocument, "*" )),BooleanClause.Occur.MUST_NOT); } else { // Extend the query appropriately for each user access token. - bf.add( new FilterClause(calculateCompleteSubfilter(fieldAllowShare,fieldDenyShare,userAccessTokens),BooleanClause.Occur.MUST)); - bf.add( new FilterClause(calculateCompleteSubfilter(fieldAllowDocument,fieldDenyDocument,userAccessTokens),BooleanClause.Occur.MUST)); + bq.add(calculateCompleteSubquery(fieldAllowShare,fieldDenyShare,userAccessTokens),BooleanClause.Occur.MUST); + bq.add(calculateCompleteSubquery(fieldAllowDocument,fieldDenyDocument,userAccessTokens),BooleanClause.Occur.MUST); } // Concatenate with the user's original query. - //FilteredQuery query = new FilteredQuery(rb.getQuery(),bf); - //rb.setQuery(query); List<Query> list = rb.getFilters(); if (list == null ) { list = new ArrayList<Query>(); rb.setFilters(list); } - list.add( new ConstantScoreQuery(bf)); + list.add( new ConstantScoreQuery(bq)); } @Override @@ -193,28 +192,27 @@ * ((fieldAllowShare is empty AND fieldDenyShare is empty) OR fieldAllowShare HAS token1 OR fieldAllowShare HAS token2 ...) * AND fieldDenyShare DOESN'T_HAVE token1 AND fieldDenyShare DOESN'T_HAVE token2 ... */ - protected Filter calculateCompleteSubfilter( String allowField, String denyField, List< String > userAccessTokens) + protected Query calculateCompleteSubquery( String allowField, String denyField, List< String > userAccessTokens) { - BooleanFilter bf = new BooleanFilter(); + BooleanQuery bq = new BooleanQuery(); + bq.setMaxClauseCount(1000000); // Add a clause for each token. This will be added directly to the main filter (as a deny test), as well as to an OR's subclause (as an allow test). - BooleanFilter orFilter = new BooleanFilter(); + BooleanQuery orQuery = new BooleanQuery(); + orQuery.setMaxClauseCount(1000000); + // Add the empty-acl case - BooleanFilter subUnprotectedClause = new BooleanFilter(); - subUnprotectedClause.add( new FilterClause( new QueryWrapperFilter( new WildcardQuery( new Term(allowField, "*" ))),BooleanClause.Occur.MUST_NOT)); - subUnprotectedClause.add( new FilterClause( new QueryWrapperFilter( new WildcardQuery( new Term(denyField, "*" ))),BooleanClause.Occur.MUST_NOT)); - orFilter.add( new FilterClause(subUnprotectedClause,BooleanClause.Occur.SHOULD)); + BooleanQuery subUnprotectedClause = new BooleanQuery(); + subUnprotectedClause.add( new WildcardQuery( new Term(allowField, "*" )),BooleanClause.Occur.MUST_NOT); + subUnprotectedClause.add( new WildcardQuery( new Term(denyField, "*" )),BooleanClause.Occur.MUST_NOT); + orQuery.add(subUnprotectedClause,BooleanClause.Occur.SHOULD); for ( String accessToken : userAccessTokens) { - TermsFilter tf = new TermsFilter(); - tf.addTerm( new Term(allowField,accessToken)); - orFilter.add( new FilterClause(tf,BooleanClause.Occur.SHOULD)); - tf = new TermsFilter(); - tf.addTerm( new Term(denyField,accessToken)); - bf.add( new FilterClause(tf,BooleanClause.Occur.MUST_NOT)); + orQuery.add( new TermQuery( new Term(allowField,accessToken)),BooleanClause.Occur.SHOULD); + bq.add( new TermQuery( new Term(denyField,accessToken)),BooleanClause.Occur.MUST_NOT); } - bf.add( new FilterClause(orFilter,BooleanClause.Occur.MUST)); - return bf; + bq.add(orQuery,BooleanClause.Occur.MUST); + return bq; } //---------------------------------------------------------------------------------
          Hide
          Karl Wright added a comment -

          Doing some debugging on the test yields no joy. Here's a chunk of the output (where I dump the security part of the query that is being applied):

              [junit] ------------- Standard Error -----------------
              [junit] +((-allow_token_share:* -deny_token_share:*) allow_token_share:token1 -deny_token_share:token1) +((-allow_token_document:* -deny_token_document:*) allow_token_document:token1 -deny_token_document:token1)
              [junit] 22/09/2011 08:26:50 ? org.apache.solr.SolrTestCaseJ4 assertQ
              [junit] SEVERE: REQUEST FAILED: xpath=//*[@numFound='3']
              [junit]     xml response was: <?xml version="1.0" encoding="UTF-8"?>
              [junit] <response>
              [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">116</int><lst name="params"><str name="echoParams">all</str><str name="fl">id</str><str name="q">*:*</str><str name="qt">/mcf</str><str name="UserTokens">token1</str><str name="mcf">true</str></lst></lst><result name="response" numFound="0" start="0"></result>
              [junit] </response>
          

          The query looks correct, and given the following data:

              //             |     share    |   document
              //             |--------------|--------------
              //             | allow | deny | allow | deny
              // ------------+-------+------+-------+------
              // da12        |       |      | 1, 2  |
              // ------------+-------+------+-------+------
              // da13-dd3    |       |      | 1,3   | 3
              // ------------+-------+------+-------+------
              // sa123-sd13  | 1,2,3 | 1, 3 |       |
              // ------------+-------+------+-------+------
              // sa3-sd1-da23| 3     | 1    | 2,3   |
              // ------------+-------+------+-------+------
              // notoken     |       |      |       |
              // ------------+-------+------+-------+------
          

          ... I would indeed expect three documents to be returned by that query: da12, da13-dd3, and notoken.
          So I have to conclude that there's currently a bug in trunk in BooleanQuery. Is anybody looking at this?

          Show
          Karl Wright added a comment - Doing some debugging on the test yields no joy. Here's a chunk of the output (where I dump the security part of the query that is being applied): [junit] ------------- Standard Error ----------------- [junit] +((-allow_token_share:* -deny_token_share:*) allow_token_share:token1 -deny_token_share:token1) +((-allow_token_document:* -deny_token_document:*) allow_token_document:token1 -deny_token_document:token1) [junit] 22/09/2011 08:26:50 ? org.apache.solr.SolrTestCaseJ4 assertQ [junit] SEVERE: REQUEST FAILED: xpath= //*[@numFound='3'] [junit] xml response was: <?xml version= "1.0" encoding= "UTF-8" ?> [junit] <response> [junit] <lst name= "responseHeader" >< int name= "status" >0</ int >< int name= "QTime" >116</ int ><lst name= "params" ><str name= "echoParams" >all</str><str name= "fl" >id</str><str name= "q" >*:*</str><str name= "qt" >/mcf</str><str name= "UserTokens" >token1</str><str name= "mcf" > true </str></lst></lst><result name= "response" numFound= "0" start= "0" ></result> [junit] </response> The query looks correct, and given the following data: // | share | document // |--------------|-------------- // | allow | deny | allow | deny // ------------+-------+------+-------+------ // da12 | | | 1, 2 | // ------------+-------+------+-------+------ // da13-dd3 | | | 1,3 | 3 // ------------+-------+------+-------+------ // sa123-sd13 | 1,2,3 | 1, 3 | | // ------------+-------+------+-------+------ // sa3-sd1-da23| 3 | 1 | 2,3 | // ------------+-------+------+-------+------ // notoken | | | | // ------------+-------+------+-------+------ ... I would indeed expect three documents to be returned by that query: da12, da13-dd3, and notoken. So I have to conclude that there's currently a bug in trunk in BooleanQuery. Is anybody looking at this?
          Hide
          Erik Hatcher added a comment -

          Does the DelegatingCollector require filtering of the full result set or can it do top-X?

          Good point, Jan. At this point, yes, even a PostFilter would evaluate every remaining match. Currently, as far as I know, there's no partial/approximate facet count feature for this scenario. PostFilter is meant to evaluate the least amount of matching documents after all other q/fq constraints, but still everything remaining.

          Karl -

          If anybody knows why this shouldn't work, please let me know.

          A purely negative query in Lucene has always matched nothing, so it looks like you need to add a MatchAllDocsQuery in there as a MUST (or SHOULD would work too) - I guess this is what Ryan was referring to earlier, sorry I didn't catch that detail.

          Show
          Erik Hatcher added a comment - Does the DelegatingCollector require filtering of the full result set or can it do top-X? Good point, Jan. At this point, yes, even a PostFilter would evaluate every remaining match. Currently, as far as I know, there's no partial/approximate facet count feature for this scenario. PostFilter is meant to evaluate the least amount of matching documents after all other q/fq constraints, but still everything remaining. Karl - If anybody knows why this shouldn't work, please let me know. A purely negative query in Lucene has always matched nothing, so it looks like you need to add a MatchAllDocsQuery in there as a MUST (or SHOULD would work too) - I guess this is what Ryan was referring to earlier, sorry I didn't catch that detail.
          Hide
          Karl Wright added a comment - - edited

          A purely negative query in Lucene has always matched nothing, so it looks like you need to add a MatchAllDocsQuery in there as a MUST (or SHOULD would work too) - I guess this is what Ryan was referring to earlier, sorry I didn't catch that detail.

          Sorry Erik, it's not purely negative. Note the parenthesization.

          +((-allow_token_share:* -deny_token_share:*) allow_token_share:token1 -deny_token_share:token1) +((-allow_token_document:* -deny_token_document:*) allow_token_document:token1 -deny_token_document:token1)
          
          Show
          Karl Wright added a comment - - edited A purely negative query in Lucene has always matched nothing, so it looks like you need to add a MatchAllDocsQuery in there as a MUST (or SHOULD would work too) - I guess this is what Ryan was referring to earlier, sorry I didn't catch that detail. Sorry Erik, it's not purely negative. Note the parenthesization. +((-allow_token_share:* -deny_token_share:*) allow_token_share:token1 -deny_token_share:token1) +((-allow_token_document:* -deny_token_document:*) allow_token_document:token1 -deny_token_document:token1)
          Hide
          Ryan McKinley added a comment -

          A one-to-one conversion of filters to queries

          I don't understand why you want/need to do that. The filters/queries are all wrapped in a ConstantScoreQuery at the end – no need to convert the others.

          list.add(new ConstantScoreQuery(bf));
          

          Again, i think any discussion of QParser vs Component is irrelevant until there is a general consensus that any security stuff belongs anywhere in solr. After that, the question of QParser vs Component is more about where we want this to go down the road...

          Show
          Ryan McKinley added a comment - A one-to-one conversion of filters to queries I don't understand why you want/need to do that. The filters/queries are all wrapped in a ConstantScoreQuery at the end – no need to convert the others. list.add( new ConstantScoreQuery(bf)); Again, i think any discussion of QParser vs Component is irrelevant until there is a general consensus that any security stuff belongs anywhere in solr. After that, the question of QParser vs Component is more about where we want this to go down the road...
          Hide
          Karl Wright added a comment -

          I don't understand why you want/need to do that. The filters/queries are all wrapped in a ConstantScoreQuery at the end - no need to convert the others.

          I had intended to do some performance comparisons between filters and queries, but I never got that far because of the problem I ran into and reported. I thought that doing things in filterland might well be expensive when there are lots of documents in the system, which is one thing I wanted to explore. That, and also whether negated wildcards actually work reasonably or whether I should be introducing a "nothing here" special token that would make the wildcard queries unnecessary.

          Again, i think any discussion of QParser vs Component is irrelevant until there is a general consensus that any security stuff belongs anywhere in solr. After that, the question of QParser vs Component is more about where we want this to go down the road...

          Whatever the decision is, I still have a problem to solve and I'm working towards solving it. Unless the Solr community decides that ManifoldCF's intrinsic model is anathema, and fights any implementation tooth and nail, this will still need to be done one way or another. Hope that's okay with you. In the meantime I believe I've found a trunk bug that seems pretty serious. If I get confirmation that I'm not just doing something stupid I'll open another ticket for it, but all I'm looking for right at the moment is a simple, "yeah, that should work", or "no, you idiot, you forgot xxx...."

          Show
          Karl Wright added a comment - I don't understand why you want/need to do that. The filters/queries are all wrapped in a ConstantScoreQuery at the end - no need to convert the others. I had intended to do some performance comparisons between filters and queries, but I never got that far because of the problem I ran into and reported. I thought that doing things in filterland might well be expensive when there are lots of documents in the system, which is one thing I wanted to explore. That, and also whether negated wildcards actually work reasonably or whether I should be introducing a "nothing here" special token that would make the wildcard queries unnecessary. Again, i think any discussion of QParser vs Component is irrelevant until there is a general consensus that any security stuff belongs anywhere in solr. After that, the question of QParser vs Component is more about where we want this to go down the road... Whatever the decision is, I still have a problem to solve and I'm working towards solving it. Unless the Solr community decides that ManifoldCF's intrinsic model is anathema, and fights any implementation tooth and nail, this will still need to be done one way or another. Hope that's okay with you. In the meantime I believe I've found a trunk bug that seems pretty serious. If I get confirmation that I'm not just doing something stupid I'll open another ticket for it, but all I'm looking for right at the moment is a simple, "yeah, that should work", or "no, you idiot, you forgot xxx...."
          Hide
          Karl Wright added a comment -

          I've opened ticket LUCENE-3450 for the BooleanQuery issue.

          Show
          Karl Wright added a comment - I've opened ticket LUCENE-3450 for the BooleanQuery issue.
          Hide
          Karl Wright added a comment -

          I now have 4 versions of the plugin, all of which still are SearchComponents. The four are:

          (1) uses filters and wildcards
          (2) uses queries and wildcards
          (3) uses filters and a special token to mark security fields that are "empty"
          (4) uses queries and a special token to mark security fields that are "empty"

          I've done some timings, using 5000 documents, a realistic number of user tokens (>100), for 3000 user queries. The numbers are interesting:

          Filter + wildcard = 193948ms
          Query + wildcard = 26137ms
          Filter + token = 39012ms
          Query + token = 25078ms

          Since the current implementation is the first, and that's obviously by far the worst performancewise, I recommend switching to a query-based implementation regardless of whether it's a SearchComponent or query parser plugin.

          Show
          Karl Wright added a comment - I now have 4 versions of the plugin, all of which still are SearchComponents. The four are: (1) uses filters and wildcards (2) uses queries and wildcards (3) uses filters and a special token to mark security fields that are "empty" (4) uses queries and a special token to mark security fields that are "empty" I've done some timings, using 5000 documents, a realistic number of user tokens (>100), for 3000 user queries. The numbers are interesting: Filter + wildcard = 193948ms Query + wildcard = 26137ms Filter + token = 39012ms Query + token = 25078ms Since the current implementation is the first, and that's obviously by far the worst performancewise, I recommend switching to a query-based implementation regardless of whether it's a SearchComponent or query parser plugin.
          Hide
          Michael McCandless added a comment -

          Those are surprising results!

          I would have expected the BooleanFilter to be faster than BooleanQuery, if enough docs match (eg this is why MultiTermQuery's AUTO rewrite cuts over to a Filter once enough terms/docs match).

          And, separately, I would have expected a dedicated (indexed) token to be faster than using WildcardQuery instead, at least faster-er than you are seeing in the Query case.

          Curious...

          Show
          Michael McCandless added a comment - Those are surprising results! I would have expected the BooleanFilter to be faster than BooleanQuery, if enough docs match (eg this is why MultiTermQuery's AUTO rewrite cuts over to a Filter once enough terms/docs match). And, separately, I would have expected a dedicated (indexed) token to be faster than using WildcardQuery instead, at least faster-er than you are seeing in the Query case. Curious...
          Hide
          Karl Wright added a comment -

          The token vs. wildcard difference may be small because the number of distinct indexed token values is small. I'm going to try something bigger to see if the difference gets larger.

          There are good reasons to stick with the wildcard approach, having to do with documents that aren't indexed using Manifold. You really want them to be treated as if they are "open"; the token-based approach will cause them to be excluded, unfortunately. So I'm hoping the numbers stay good for wildcards. I'm going to try a few tricks Simon taught me to limit the number of query rewrites on those clauses, in order to make it as fast as possible.

          I don't understand the filter being so slow, however. That is indeed a surprise. As a result, I'll be attaching a query-based patch shortly.

          Show
          Karl Wright added a comment - The token vs. wildcard difference may be small because the number of distinct indexed token values is small. I'm going to try something bigger to see if the difference gets larger. There are good reasons to stick with the wildcard approach, having to do with documents that aren't indexed using Manifold. You really want them to be treated as if they are "open"; the token-based approach will cause them to be excluded, unfortunately. So I'm hoping the numbers stay good for wildcards. I'm going to try a few tricks Simon taught me to limit the number of query rewrites on those clauses, in order to make it as fast as possible. I don't understand the filter being so slow, however. That is indeed a surprise. As a result, I'll be attaching a query-based patch shortly.
          Hide
          Karl Wright added a comment -

          Attached SOLR-1895-queries.patch, for query version of search component.

          Show
          Karl Wright added a comment - Attached SOLR-1895 -queries.patch, for query version of search component.
          Hide
          Karl Wright added a comment -

          A lot of these timing tests, on retrospect, seems too highly variable to be meaningful. I'm getting timings with the query-based artifact that vary between 26s and 28s, but with filters I've now seen timings between 25s and 155s. Not sure why that should be, but there it is.

          Show
          Karl Wright added a comment - A lot of these timing tests, on retrospect, seems too highly variable to be meaningful. I'm getting timings with the query-based artifact that vary between 26s and 28s, but with filters I've now seen timings between 25s and 155s. Not sure why that should be, but there it is.
          Hide
          Jan Høydahl added a comment -

          There are good reasons to stick with the wildcard approach, having to do with documents that aren't indexed using Manifold. You really want them to be treated as if they are "open"; the token-based approach will cause them to be excluded, unfortunately.

          There are ways to tackle this. One is to define a default for the token fields, ensuring all documents with no values for these fields to be explicitly defined as open:

            <field name="allow-tokens" type="string" default="all" />
            <field name="deny-tokens" type="string" default="none" />
          

          Another approach is to use an UpdateProcessor which sets these fields somewhat intelligently based on source, if they are missing.

          Show
          Jan Høydahl added a comment - There are good reasons to stick with the wildcard approach, having to do with documents that aren't indexed using Manifold. You really want them to be treated as if they are "open"; the token-based approach will cause them to be excluded, unfortunately. There are ways to tackle this. One is to define a default for the token fields, ensuring all documents with no values for these fields to be explicitly defined as open: <field name= "allow-tokens" type= "string" default = "all" /> <field name= "deny-tokens" type= "string" default = "none" /> Another approach is to use an UpdateProcessor which sets these fields somewhat intelligently based on source, if they are missing.
          Hide
          Karl Wright added a comment -

          Uploaded queries-based version that uses the trick that Jan pointed out, so it is token-based and does not use wildcards.

          Show
          Karl Wright added a comment - Uploaded queries-based version that uses the trick that Jan pointed out, so it is token-based and does not use wildcards.
          Hide
          Karl Wright added a comment -

          Attached patch including both a SearchComponent and a QParserPlugin, in SOLR-1895-queries.patch.

          Show
          Karl Wright added a comment - Attached patch including both a SearchComponent and a QParserPlugin, in SOLR-1895 -queries.patch.
          Hide
          Karl Wright added a comment -

          Is there any possibility that the SOLR-1895-queries patch will be committed as nothing more complicated than a Solr mcf contrib module? That's how I've set it up, and based on the discussion for this ticket it seems there will always be an MCF-specific component in any case. If agreement is eventually reached and there is a true Solr security infrastructure, it should be a simple matter to redo the MCF pieces to use it later.

          Show
          Karl Wright added a comment - Is there any possibility that the SOLR-1895 -queries patch will be committed as nothing more complicated than a Solr mcf contrib module? That's how I've set it up, and based on the discussion for this ticket it seems there will always be an MCF-specific component in any case. If agreement is eventually reached and there is a true Solr security infrastructure, it should be a simple matter to redo the MCF pieces to use it later.
          Hide
          Karl Wright added a comment -

          Updated to fix an outdated class reference

          Show
          Karl Wright added a comment - Updated to fix an outdated class reference
          Hide
          Ryan McKinley added a comment -

          will be committed as nothing more complicated than a Solr mcf contrib module?

          If it is MCF specific, I doubt there will be consensus to commit it – if it really is MCF specific, then maintaining it in MCF makes the most sense.

          however, I don't think this is (or should be) MCF specific. The basic approach you take is general, and I would love to see some building blocks like this exist in solr.

          Show
          Ryan McKinley added a comment - will be committed as nothing more complicated than a Solr mcf contrib module? If it is MCF specific, I doubt there will be consensus to commit it – if it really is MCF specific, then maintaining it in MCF makes the most sense. however, I don't think this is (or should be) MCF specific. The basic approach you take is general, and I would love to see some building blocks like this exist in solr.
          Hide
          Karl Wright added a comment -

          If it is MCF specific, I doubt there will be consensus to commit it if it really is MCF specific, then maintaining it in MCF makes the most sense.

          Well, part of it is going to be MCF-specific - namely the part that talks to the MCF authority service and interprets the response. Erik made it clear that that part has to be in contrib, I think. He didn't sound like he had a problem with it being in Solr as long as it was in contrib though.

          however, I don't think this is (or should be) MCF specific. The basic approach you take is general, and I would love to see some building blocks like this exist in solr.

          I would too, but I'm getting the impression that insufficient consensus exists to go forward with the general infrastructure. So I'm looking to see if consensus exists for a stop-gap solution, which is a contrib module that is MCF specific, with no common infrastructure (yet) until consensus can be achieved. When it is I can refactor the contrib module to use it. If this looks unlikely as well, I'll plan to build the infrastructure in the ManifoldCF world to release Solr contrib modules that support ManifoldCF's security model.

          Show
          Karl Wright added a comment - If it is MCF specific, I doubt there will be consensus to commit it if it really is MCF specific, then maintaining it in MCF makes the most sense. Well, part of it is going to be MCF-specific - namely the part that talks to the MCF authority service and interprets the response. Erik made it clear that that part has to be in contrib, I think. He didn't sound like he had a problem with it being in Solr as long as it was in contrib though. however, I don't think this is (or should be) MCF specific. The basic approach you take is general, and I would love to see some building blocks like this exist in solr. I would too, but I'm getting the impression that insufficient consensus exists to go forward with the general infrastructure. So I'm looking to see if consensus exists for a stop-gap solution, which is a contrib module that is MCF specific, with no common infrastructure (yet) until consensus can be achieved. When it is I can refactor the contrib module to use it. If this looks unlikely as well, I'll plan to build the infrastructure in the ManifoldCF world to release Solr contrib modules that support ManifoldCF's security model.
          Hide
          Ryan McKinley added a comment -

          What about starting a solr-security project on apache-extras?

          I see this patch as a starting place for a security infrastructure. Given the reluctance, maybe it makes sense to let it bake elsewhere and revisit after more stuff exists. This may be a better home then MCF since it would keep things general enough that more people could use it. I would even suggest that the core of this should only depend on lucene – not solr.

          Show
          Ryan McKinley added a comment - What about starting a solr-security project on apache-extras? I see this patch as a starting place for a security infrastructure. Given the reluctance, maybe it makes sense to let it bake elsewhere and revisit after more stuff exists. This may be a better home then MCF since it would keep things general enough that more people could use it. I would even suggest that the core of this should only depend on lucene – not solr.
          Hide
          Karl Wright added a comment -

          What about starting a solr-security project on apache-extras?

          I'll have to think about this. If it was in the same svn I'd feel better about doing it that way. Nevertheless it would be straightforward to spin off solr-security from ManifoldCF at some point, as long as we do a good job of keeping stuff in contrib, seems to me. I could imagine an eventual lucene/solr subproject, which would be ideal - but until then, it's going to be in the wilderness somewhere.

          Show
          Karl Wright added a comment - What about starting a solr-security project on apache-extras? I'll have to think about this. If it was in the same svn I'd feel better about doing it that way. Nevertheless it would be straightforward to spin off solr-security from ManifoldCF at some point, as long as we do a good job of keeping stuff in contrib, seems to me. I could imagine an eventual lucene/solr subproject, which would be ideal - but until then, it's going to be in the wilderness somewhere.
          Hide
          Jan Høydahl added a comment -

          To me it sounds like a good match for contrib, ensuring a clean plugin experience (apache-solr-mcf-security-4.0-SNAPSHOT.jar) that ships with Solr and is kept up-to-date. The size will be really small as well, no external jars. Later when we're ready for baby steps towards a generic security framework, we can refactor parts of this contrib into core with a single patch.

          One thing, will the contrib do more than security for MCF? If not, perhaps it should be renamed "mcf-security"?

          Anyone against mcf-security as a contrib?

          Show
          Jan Høydahl added a comment - To me it sounds like a good match for contrib, ensuring a clean plugin experience (apache-solr-mcf-security-4.0-SNAPSHOT.jar) that ships with Solr and is kept up-to-date. The size will be really small as well, no external jars. Later when we're ready for baby steps towards a generic security framework, we can refactor parts of this contrib into core with a single patch. One thing, will the contrib do more than security for MCF? If not, perhaps it should be renamed "mcf-security"? Anyone against mcf-security as a contrib?
          Hide
          Karl Wright added a comment -

          Anyone against mcf-security as a contrib?

          No objection here. Do you want a new patch, or can you just move the directory and change build.xml to change the jar name?

          Show
          Karl Wright added a comment - Anyone against mcf-security as a contrib? No objection here. Do you want a new patch, or can you just move the directory and change build.xml to change the jar name?
          Hide
          Chris Male added a comment -

          Anyone against mcf-security as a contrib?

          Yes I am (although I won't vote -1 on it). I don't feel it's a solution here, its just pushing the problem into the corner and hoping that out-of-sight out-of-mind. If this is going to be in the solr codebase, why not make it core? It has no external dependencies and is tiny. If we feel that it shouldn't be in solr-core, I don't know why it belongs in contrib. Contrib shouldn't be where we go when we don't agree (code then gets ignored, falls out of sync, and ends up being sandboxed it after 2 years).

          The issue seems to be whether this should be part of the solr project, or ManifoldCF.

          Show
          Chris Male added a comment - Anyone against mcf-security as a contrib? Yes I am (although I won't vote -1 on it). I don't feel it's a solution here, its just pushing the problem into the corner and hoping that out-of-sight out-of-mind. If this is going to be in the solr codebase, why not make it core? It has no external dependencies and is tiny. If we feel that it shouldn't be in solr-core, I don't know why it belongs in contrib. Contrib shouldn't be where we go when we don't agree (code then gets ignored, falls out of sync, and ends up being sandboxed it after 2 years). The issue seems to be whether this should be part of the solr project, or ManifoldCF.
          Hide
          Erik Hatcher added a comment -

          Anyone against mcf-security as a contrib?

          No objection here either, though because this is such a small bit of code and is specific to MCF it seems best placed in MCF here. The majority of people using Solr are not using MCF, though I'd venture to say that the majority of folks using MCF are using Solr. It's maintenance best fits under the committership of MCF, in my opinion. But again, no objections from me on it being a Solr contrib if others feel strongly about it.

          I personally don't see a distillation of this, anytime soon, into a common security framework within Solr so unless someone already has this generified and a second real-world implementation of it I don't find it a compelling argument to try to make this generic from the start. Though again, knock yourselves out folks. I'm glad to see this work being done, for sure, and I support the effort wherever it ultimately lives.

          Show
          Erik Hatcher added a comment - Anyone against mcf-security as a contrib? No objection here either, though because this is such a small bit of code and is specific to MCF it seems best placed in MCF here. The majority of people using Solr are not using MCF, though I'd venture to say that the majority of folks using MCF are using Solr. It's maintenance best fits under the committership of MCF, in my opinion. But again, no objections from me on it being a Solr contrib if others feel strongly about it. I personally don't see a distillation of this, anytime soon, into a common security framework within Solr so unless someone already has this generified and a second real-world implementation of it I don't find it a compelling argument to try to make this generic from the start. Though again, knock yourselves out folks. I'm glad to see this work being done, for sure, and I support the effort wherever it ultimately lives.
          Hide
          Mark Miller added a comment -

          mcf-security as a contrib

          I'd prefer not - I think mcf-security belongs in MCF.

          Show
          Mark Miller added a comment - mcf-security as a contrib I'd prefer not - I think mcf-security belongs in MCF.
          Hide
          Jan Høydahl added a comment -

          The most important to me is that great contributions like this are being welcomed and brought out to Solr users somehow, whether the code lives here or there from the start. As I have a few projects in the workings that need non-mcf security filtering integration with Solr I'd be contributing to this code base.

          Another way to think of contrib is as a great way to introduce people to integration with components that may be of interest to their search app. A new Solr user would browse contrib and think "I don't need dataImportHandler, or clustering, but I could surely use extraction, langId and security". She may not even be familiar with any of Carrot2, Tika or MCF in advance. Connectors and security is important to more Solr users than e.g. clustering is, in my experience.

          Show
          Jan Høydahl added a comment - The most important to me is that great contributions like this are being welcomed and brought out to Solr users somehow, whether the code lives here or there from the start. As I have a few projects in the workings that need non-mcf security filtering integration with Solr I'd be contributing to this code base. Another way to think of contrib is as a great way to introduce people to integration with components that may be of interest to their search app. A new Solr user would browse contrib and think "I don't need dataImportHandler, or clustering, but I could surely use extraction, langId and security". She may not even be familiar with any of Carrot2, Tika or MCF in advance. Connectors and security is important to more Solr users than e.g. clustering is, in my experience.
          Hide
          Mark Miller added a comment -

          I still think MCF integration with SOLR should live in MCF.

          If there are more general 'filtering' components we can add to Solr, lets create new issues around them.

          I also think that if we go down the route of working on a general "security" framework, that we don't actually use the word security - as it implies much more than we will be providing based on any of what I have seen proposed. If we want to add a "document filtering" framework of some kind, that's something I would happily endorse. I'm still don't think we want to enter the 'security' game - there is a reason this has generally been pushed off to users in the past.

          Show
          Mark Miller added a comment - I still think MCF integration with SOLR should live in MCF. If there are more general 'filtering' components we can add to Solr, lets create new issues around them. I also think that if we go down the route of working on a general "security" framework, that we don't actually use the word security - as it implies much more than we will be providing based on any of what I have seen proposed. If we want to add a "document filtering" framework of some kind, that's something I would happily endorse. I'm still don't think we want to enter the 'security' game - there is a reason this has generally been pushed off to users in the past.
          Hide
          Alexey Serba added a comment -

          Karl, could you please clarify

          • Does ManifoldCF security component support arbitrary security model or it's just about Windows AD?
          • What's token_share? How is it different from token_document?
          • It seems that under the hood it generates "deny overrides allow" type of query. But I'm not sure that it is always the case, because afaiu the order of Access Control Entries (ACE) in Access Control List (ACL) is important.
          Show
          Alexey Serba added a comment - Karl, could you please clarify Does ManifoldCF security component support arbitrary security model or it's just about Windows AD? What's token_share ? How is it different from token_document ? It seems that under the hood it generates "deny overrides allow" type of query. But I'm not sure that it is always the case, because afaiu the order of Access Control Entries (ACE) in Access Control List (ACL) is important.
          Hide
          Karl Wright added a comment -

          * Does ManifoldCF security component support arbitrary security model or it's just about Windows AD?

          The short answer is, "no"; it is not just about Windows AD. ManifoldCF supports AD, FileNet, Documentum, LiveLink, and Meridio authorization as well, and others get added periodically.

          It might help for you to read up on ManifoldCF. There's a book (http://www.manning.com/wright); you'd want to read Chapters 4 and 8. I can email them to you if you give me a preferred email address. I will also be presenting in Barcelona in a month on this topic.

          * What's token_share? How is it different from token_document?

          In order to be able to support AD, ManifoldCF allows multiple levels of token. There are allow/deny tokens for "share" level (which correspond in AD to Windows shares), "document" level (which correspond to windows documents), and also N folder levels (which don't appear in Solr because the ManifoldCF Solr output connector won't accept documents that have security set on those). Share and document security operate completely independently of one another, but a document cannot be viewed unless it is allowed (and not denied) on BOTH levels.

          * It seems that under the hood it generates "deny overrides allow" type of query. But I'm not sure that it is always the case, because afaiu the order of Access Control Entries (ACE) in Access Control List (ACL) is important.

          That is actually not the case. This was tested extensively at MetaCarta in-house; there is no order dependency in AD or any other ManifoldCF-supported repository.

          Show
          Karl Wright added a comment - * Does ManifoldCF security component support arbitrary security model or it's just about Windows AD? The short answer is, "no"; it is not just about Windows AD. ManifoldCF supports AD, FileNet, Documentum, LiveLink, and Meridio authorization as well, and others get added periodically. It might help for you to read up on ManifoldCF. There's a book ( http://www.manning.com/wright ); you'd want to read Chapters 4 and 8. I can email them to you if you give me a preferred email address. I will also be presenting in Barcelona in a month on this topic. * What's token_share? How is it different from token_document? In order to be able to support AD, ManifoldCF allows multiple levels of token. There are allow/deny tokens for "share" level (which correspond in AD to Windows shares), "document" level (which correspond to windows documents), and also N folder levels (which don't appear in Solr because the ManifoldCF Solr output connector won't accept documents that have security set on those). Share and document security operate completely independently of one another, but a document cannot be viewed unless it is allowed (and not denied) on BOTH levels. * It seems that under the hood it generates "deny overrides allow" type of query. But I'm not sure that it is always the case, because afaiu the order of Access Control Entries (ACE) in Access Control List (ACL) is important. That is actually not the case. This was tested extensively at MetaCarta in-house; there is no order dependency in AD or any other ManifoldCF-supported repository.
          Hide
          Koji Sekiguchi added a comment -

          I fixed the patch for distributed search. I also modified dot.classpath file in this patch.

          Show
          Koji Sekiguchi added a comment - I fixed the patch for distributed search. I also modified dot.classpath file in this patch.
          Hide
          Chris Male added a comment -

          Is the intention to go ahead and add this as a contrib? I really feel that is a mistake.

          The most important to me is that great contributions like this are being welcomed and brought out to Solr users somehow, whether the code lives here or there from the start. As I have a few projects in the workings that need non-mcf security filtering integration with Solr I'd be contributing to this code base.

          Absolutely and by this code being in MCF, any MCF + Solr users will find it.

          Another way to think of contrib is as a great way to introduce people to integration with components that may be of interest to their search app. A new Solr user would browse contrib and think "I don't need dataImportHandler, or clustering, but I could surely use extraction, langId and security". She may not even be familiar with any of Carrot2, Tika or MCF in advance. Connectors and security is important to more Solr users than e.g. clustering is, in my experience.

          You have a valid point on the importance of some features over others, but that doesn't mean we should have them all in Solr. Contrib shouldn't be a dropoff point for everything that is related to Solr or Lucene, especially when there are active projects with active committers where the code could live. ManifoldCF seems to be all about Connectors and Security, so why should Solr get into that?

          Show
          Chris Male added a comment - Is the intention to go ahead and add this as a contrib? I really feel that is a mistake. The most important to me is that great contributions like this are being welcomed and brought out to Solr users somehow, whether the code lives here or there from the start. As I have a few projects in the workings that need non-mcf security filtering integration with Solr I'd be contributing to this code base. Absolutely and by this code being in MCF, any MCF + Solr users will find it. Another way to think of contrib is as a great way to introduce people to integration with components that may be of interest to their search app. A new Solr user would browse contrib and think "I don't need dataImportHandler, or clustering, but I could surely use extraction, langId and security". She may not even be familiar with any of Carrot2, Tika or MCF in advance. Connectors and security is important to more Solr users than e.g. clustering is, in my experience. You have a valid point on the importance of some features over others, but that doesn't mean we should have them all in Solr. Contrib shouldn't be a dropoff point for everything that is related to Solr or Lucene, especially when there are active projects with active committers where the code could live. ManifoldCF seems to be all about Connectors and Security, so why should Solr get into that?
          Hide
          Koji Sekiguchi added a comment -

          My intention of attaching the patch is not going to add this to Solr contrib. sorry for the confusing. I just wanted to fix the real error as I found it when I tried distributed search (now it works very well).

          Show
          Koji Sekiguchi added a comment - My intention of attaching the patch is not going to add this to Solr contrib. sorry for the confusing. I just wanted to fix the real error as I found it when I tried distributed search (now it works very well).
          Hide
          Mark Miller added a comment -

          The carrot contrib is not really an integration in the same way - its a fully formed solr module - we distrib all the files you need to run it in the contrib. Same with extraction and DIH - these are not integration points for other projects that we have agreed to maintain for them - these are fully formed, fully functioning, contribs.

          Show
          Mark Miller added a comment - The carrot contrib is not really an integration in the same way - its a fully formed solr module - we distrib all the files you need to run it in the contrib. Same with extraction and DIH - these are not integration points for other projects that we have agreed to maintain for them - these are fully formed, fully functioning, contribs.
          Hide
          Chris Male added a comment -

          I agree Mark.

          Show
          Chris Male added a comment - I agree Mark.
          Hide
          Jan Høydahl added a comment -

          I rest my case

          As long as this plugin is gonna be MCF-only, I follow the reasoning.
          But as Mark and Erik suggest, I hope we can continue the road towards some security (or AccessToken filtering, hehe) code in core, which eventually MCF and others may benefit from.

          Show
          Jan Høydahl added a comment - I rest my case As long as this plugin is gonna be MCF-only, I follow the reasoning. But as Mark and Erik suggest, I hope we can continue the road towards some security (or AccessToken filtering, hehe) code in core, which eventually MCF and others may benefit from.
          Hide
          Karl Wright added a comment -

          I've created svn copies of both trunk and the 3.x branch of solr/lucene in the ManifoldCF project. From now on, please attach patches to this feature to ManifoldCF tickets, and I will commit them and propagate the built jars to ManifoldCF's trunk. The svn urls are:

          https://svn.apache.org/repos/asf/incubator/lcf/integration/solr-4.x/trunk
          and
          https://svn.apache.org/repos/asf/incubator/lcf/integration/solr-3.x/trunk

          These are being managed for the time being as contrib modules in entire solr/lucene instances in case the decision is revisited at some point in the future. The code is up-to-date as far as Koji's latest patch is concerned (it would be good for Koji to check this, however).

          Show
          Karl Wright added a comment - I've created svn copies of both trunk and the 3.x branch of solr/lucene in the ManifoldCF project. From now on, please attach patches to this feature to ManifoldCF tickets, and I will commit them and propagate the built jars to ManifoldCF's trunk. The svn urls are: https://svn.apache.org/repos/asf/incubator/lcf/integration/solr-4.x/trunk and https://svn.apache.org/repos/asf/incubator/lcf/integration/solr-3.x/trunk These are being managed for the time being as contrib modules in entire solr/lucene instances in case the decision is revisited at some point in the future. The code is up-to-date as far as Koji's latest patch is concerned (it would be good for Koji to check this, however).
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          Jan Høydahl added a comment -

          Closing this as Won't fix since the fix is checked in to MCF's source tree

          Show
          Jan Høydahl added a comment - Closing this as Won't fix since the fix is checked in to MCF's source tree

            People

            • Assignee:
              Unassigned
              Reporter:
              Karl Wright
            • Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development