Solr
  1. Solr
  2. SOLR-527

An XML commit only request handler

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Won't Fix
    • Affects Version/s: 1.3
    • Fix Version/s: None
    • Component/s: update
    • Labels:
      None

      Description

      This request handler only permits <commit/> messages. It is provided as one way to prevent adds and deletes on a Solr slave machine that could potentially be accessed by outside parties where a firewall or other access control is either not possible or not desired.

      1. ReadOnlyUpdateProcessorFactory.java
        2 kB
        Sean Timm
      2. ReadOnlyUpdateProcessorFactory.java
        2 kB
        Ryan McKinley
      3. ReadOnlyUpdateProcessorFactory.java
        2 kB
        Ryan McKinley
      4. SOLR-527.patch
        5 kB
        Sean Timm

        Activity

        Hide
        Ryan McKinley added a comment -

        I'm a little reluctant to have another request handler doing its own XML parsing to limit the normal functionality. Perhaps a better solution would be an UpdateRequestProcessor that throws an error for <add> and <delete>?

        Show
        Ryan McKinley added a comment - I'm a little reluctant to have another request handler doing its own XML parsing to limit the normal functionality. Perhaps a better solution would be an UpdateRequestProcessor that throws an error for <add> and <delete>?
        Hide
        Ryan McKinley added a comment -

        dooh, should have looked at the file before uploading.....

        Show
        Ryan McKinley added a comment - dooh, should have looked at the file before uploading.....
        Hide
        Mike Klaas added a comment -

        Is this a generally-useful feature? I'm not sure how often this use case would occur.

        Show
        Mike Klaas added a comment - Is this a generally-useful feature? I'm not sure how often this use case would occur.
        Hide
        Yonik Seeley added a comment -

        > Is this a generally-useful feature? I'm not sure how often this use case would occur.

        I agree it's not.
        I think Ryan's solution is the most elegant, and that it should remain a custom extension (i.e. not committed).

        Show
        Yonik Seeley added a comment - > Is this a generally-useful feature? I'm not sure how often this use case would occur. I agree it's not. I think Ryan's solution is the most elegant, and that it should remain a custom extension (i.e. not committed).
        Hide
        Hoss Man added a comment -

        For the record: allowing arbitrary outside parties the ability to issue commits on a slave is almost as dangerous as allowing adds/deletes. while the data itself can't be poisoned using a commit, you could DOS the slave with trashing as it warms searchers over and over again.

        if the goal is a read only slave that can still be triggered to load new snapshots, perhaps an alternate method on shapshot loading (that isn't net accessible) is in order ... ie: A variation on autocommit that polls the index dir periodically to see if it has changed.

        Show
        Hoss Man added a comment - For the record: allowing arbitrary outside parties the ability to issue commits on a slave is almost as dangerous as allowing adds/deletes. while the data itself can't be poisoned using a commit, you could DOS the slave with trashing as it warms searchers over and over again. if the goal is a read only slave that can still be triggered to load new snapshots, perhaps an alternate method on shapshot loading (that isn't net accessible) is in order ... ie: A variation on autocommit that polls the index dir periodically to see if it has changed.
        Hide
        Ryan McKinley added a comment -

        I agree neither should approach should be committed.

        But I sympathize with the general request to make slave machines read-only. I think the problem Sean is trying to solve is to have confidence that his slave machines won't accidentally update or something like that.

        Other approaches are to make the file system read-only for the slave machine (but not for running rsync)

        Perhaps with SOLR-465, their could be a read only Directory instance...

        Show
        Ryan McKinley added a comment - I agree neither should approach should be committed. But I sympathize with the general request to make slave machines read-only. I think the problem Sean is trying to solve is to have confidence that his slave machines won't accidentally update or something like that. Other approaches are to make the file system read-only for the slave machine (but not for running rsync) Perhaps with SOLR-465 , their could be a read only Directory instance...
        Hide
        Sean Timm added a comment -

        Thanks all for taking a look at this.

        The ReadOnlyUpdateProcessorFactory.java is great, Ryan. I didn't realize that update processor factories could be chained. That is cleaner than my solution.

        Hoss, I realize that DOS attacks are still possible, however, my bigger concern is someone modifying our index (e.g., injecting a Viagra advert or the like into our index).

        Show
        Sean Timm added a comment - Thanks all for taking a look at this. The ReadOnlyUpdateProcessorFactory.java is great, Ryan. I didn't realize that update processor factories could be chained. That is cleaner than my solution. Hoss, I realize that DOS attacks are still possible, however, my bigger concern is someone modifying our index (e.g., injecting a Viagra advert or the like into our index).
        Hide
        patrick o'leary added a comment -

        I guess there's a couple of questions I'd have around this.

        1. Should there be a default update mechanism if none are specified in the solrconfig.xml?
          • I can rip out the request handlers for /update, it's still available through SolrUpdateServlet, as SolrCore loads ChainUpdateProcessFactory by default, and that loads RunUpdateProcessorFactory by default. That's not what I'd expect.
        2. Should the UpdateCmd maintain some form of context of origin of an update, even like a string?
          • If embeded could be used to store anything from a file name to a db name, if http the peer ip could be stored through the UpdateServlet or RequestDispatcher.
          • Would allow custom update chains some ability to make a decision based the origin of a document.

        Overall I'd like to have the ability to determine if I should in fact allow an add / update / commit to go through, for both web based and non-web containers. But I definitely want to have the ability to switch it off.

        Show
        patrick o'leary added a comment - I guess there's a couple of questions I'd have around this. Should there be a default update mechanism if none are specified in the solrconfig.xml? I can rip out the request handlers for /update, it's still available through SolrUpdateServlet, as SolrCore loads ChainUpdateProcessFactory by default, and that loads RunUpdateProcessorFactory by default. That's not what I'd expect. Should the UpdateCmd maintain some form of context of origin of an update, even like a string? If embeded could be used to store anything from a file name to a db name, if http the peer ip could be stored through the UpdateServlet or RequestDispatcher. Would allow custom update chains some ability to make a decision based the origin of a document. Overall I'd like to have the ability to determine if I should in fact allow an add / update / commit to go through, for both web based and non-web containers. But I definitely want to have the ability to switch it off.
        Hide
        Hoss Man added a comment -

        Should there be a default update mechanism if none are specified in the solrconfig.xml?

        For backwards compatibility yes, but the simple way to prevent all updates is by mapping something else to /update – a NoOpRequestHandler would be useful here to prevent all updates (don't we already have one of those?)

        • If embeded could be used to store anything from a file name to a db name, if http the peer ip could be stored through the UpdateServlet or RequestDispatcher.
        • Would allow custom update chains some ability to make a decision based the origin of a document.

        UpdateProcessors shouldn't know/care about where the command originated from – that's mainly the point, it's an agnostic way to hook into all index modification commands regardless of origin. Logic about accepting/rejecting commands based on where they came from needs to know know about the channel of communication, so that logic should live as close to the source of that channel as possible.

        Overall I'd like to have the ability to determine if I should in fact allow an add / update / commit to go through, for both web based and non-web containers. But I definitely want to have the ability to switch it off.

        we generally try to keep Solr out of the business of authorization/security ... if you are embedding Solr, make the wrapper code decide what/when to allow commands through; if you are using Solr as a webapp, configure your servlet container with whatever path based security you want.

        Show
        Hoss Man added a comment - Should there be a default update mechanism if none are specified in the solrconfig.xml? For backwards compatibility yes, but the simple way to prevent all updates is by mapping something else to /update – a NoOpRequestHandler would be useful here to prevent all updates (don't we already have one of those?) If embeded could be used to store anything from a file name to a db name, if http the peer ip could be stored through the UpdateServlet or RequestDispatcher. Would allow custom update chains some ability to make a decision based the origin of a document. UpdateProcessors shouldn't know/care about where the command originated from – that's mainly the point, it's an agnostic way to hook into all index modification commands regardless of origin. Logic about accepting/rejecting commands based on where they came from needs to know know about the channel of communication, so that logic should live as close to the source of that channel as possible. Overall I'd like to have the ability to determine if I should in fact allow an add / update / commit to go through, for both web based and non-web containers. But I definitely want to have the ability to switch it off. we generally try to keep Solr out of the business of authorization/security ... if you are embedding Solr, make the wrapper code decide what/when to allow commands through; if you are using Solr as a webapp, configure your servlet container with whatever path based security you want.
        Hide
        Sean Timm added a comment - - edited

        I serendipitously discovered what is probably the cleanest way to only allow commits on the slave. If the index is owned by user A with permissions

        "-rw-r--r--"

        yet the slave solr process is run as user B, only read operations are allowed. This is obvious in retrospect. I just didn't think of it.

        Show
        Sean Timm added a comment - - edited I serendipitously discovered what is probably the cleanest way to only allow commits on the slave. If the index is owned by user A with permissions "-rw-r--r--" yet the slave solr process is run as user B, only read operations are allowed. This is obvious in retrospect. I just didn't think of it.
        Hide
        Noble Paul added a comment -

        Can we add an attribute to the current UpdateHandler

        solrconfig.xml
         <commitOnly>true</commitOnly>
        
        Show
        Noble Paul added a comment - Can we add an attribute to the current UpdateHandler solrconfig.xml <commitOnly> true </commitOnly>
        Hide
        Sean Timm added a comment -

        Updated to work with recently committed SOLR-660.

        Show
        Sean Timm added a comment - Updated to work with recently committed SOLR-660 .
        Hide
        Jan Høydahl added a comment -

        Things have changed now with SolrCloud which needs to ADD docs to slaves. If you don't want to enable firewalls, then secure Solr with SSL auth instead, see SOLR-4470.

        Closing, please re-open if you feel this is still relevant.

        Show
        Jan Høydahl added a comment - Things have changed now with SolrCloud which needs to ADD docs to slaves. If you don't want to enable firewalls, then secure Solr with SSL auth instead, see SOLR-4470 . Closing, please re-open if you feel this is still relevant.

          People

          • Assignee:
            Unassigned
            Reporter:
            Sean Timm
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development