Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-15589

DocRouter API needs redesigned to support stricter routing rules



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None


      This issue i being opened to track some of the things observed in SOLR-8889 about how/why the existing DocRouter API (and it's usages in solrj/solr) make it so hard to prevent/identify bugs like that.

      The hope is that someone with some time/energy/interest might wish to pursue redesigning the DocRouter API to be more "strict" so that any future bugs like SOLR-8889 will cause "fail fast" behavior because of missing info that the router knows is required, instead of the current behavior of sending updates to the wrong shard due to poor assumptions...


      Comments from SOLR-8889

      3) There does not appear to be any existing logic to “fail hard” in Solr if an update (add or delete) for a collection using router.field is processed w/o the field (or explicit _route_  param). Instead the update is silently sent to the wrong shard. (there might be a check/failure situation for this in the add/update case, but I don’t see it offhand, and certainly not in the deleteById case)

      ... looking into what it would take to "fix" #3 in some way – perhaps along the lines of Yonik's comment in SOLR-5890 suggesting a "strict" mode when using router.field) – because that seems like it would be the most robust way we can be "future proof" against any bugs (like #1 & #2 above) in solrj/solr-core code silently sending updates to the wrong shard leaders.

      Unfortunately it doesn’t seem very viable to fix #3 w/o a LOT of serious changes to the DocRouter API. The general categories of problems I’m seeing are:

      • Sloppy/Loose API Contract
        • some methods are documented to return “null” if there is not enough details to determine the correct Slice for an update, but some impls throw exceptions instead (some of the Update Chain processing code actually expects DocRouter to throw exceptions in spite of the API documentation
        • some methods take in a “route” argument in addition to an "id" argument ... but not all methods
        • some callers act aware of when a “route” has been specified by the user, and pass that the DocRouter methods as the "id" argument in place of the document's uniqueKey instead of passing it where the “route” arg is suppose to be passed – so the DocRouter impls never really know whether the “id” they have been given is from the uniqueKey field or from the router.field
        • Some code in Solr's Update Chain assumes a CompositeIdRouter is in use, and directly calls internal (but public) methods from the impl (which are not part of the DocRouter API) that don’t have access to the router.field let alone know if the input is a uniqueKey value or a router.field value (and worse: some of this code silently does nothing if some other DocRouter is used)
      • Overly Permissive Functional Design:
        • IIUC it actually seems like router.field was designed with the expectation that if an "add" command includes a SolrInputDocument that doesn’t include a value in the router field, then the uniqueKey should be used as a callback to decide which shard to send it to. (ie: documents aren't required to have a value in the router.field

      Ultimately i think the entire DocRouter API (and it’s usage in Solr) should be reconsidered, with a (strawman) goal of adding a strict option that (when true for a collection, and in some future version of solr it should default to true) would cause a DocRouter with router.field set to throw an Exception anytime any DocRouter caller calls any routing related method w/o specifying a value for the router.field. Unfortunately this seems like it would take more time/effort then I have to focus on it right now, so I plan to spin it off into it's own Improvement jira.


        Issue Links



              Unassigned Unassigned
              hossman Chris M. Hostetter
              0 Vote for this issue
              1 Start watching this issue