Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14190

Add multi-shard support to TaggerRequestHandler

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: master (9.0)
    • Fix Version/s: None
    • Component/s: query
    • Labels:
      None

      Description

      As documented in the ref-guide, the Tagger Handler currently only works on single-shard collections.

      Users attempting to invoke /tag on a multi-shard collection will get results that only represent the tags from one of the shards. This is pretty easy to reproduce with the tagger tutorial in the docs. If the geonames collection is created with multiple shards (e.g. bin/solr create -c geonames -shards 2), then the tags returned by the API vary based on which shard ends up being used. Repeating the same request returns different results:

      ➜  solr git:(master) ✗ curl -X POST 'http://localhost:8983/solr/geonames2/tag?overlaps=NO_SUB&tagsLimit=5000&fl=id,name,countrycode&wt=json&indent=on' -H 'Content-Type:text/plain' -d 'Hello New York City' 
      {
        "responseHeader":{...},
        "tagsCount":2,
        "tags":[[
            "startOffset",10,
            "endOffset",14,
            "ids",["4098776",
              "4562407"]],
          [
            "startOffset",15,
            "endOffset",19,
            "ids",["8347868"]]],
        "response":{"numFound":3,"start":0,"docs":[
            {"id":"8347868", "name":["City"], "countrycode":["AU"]},
            {"id":"4098776", "name":["York"], "countrycode":["US"]},
            {"id":"4562407", "name":["York"], "countrycode":["US"]}]
        }}
      ➜  solr git:(master) ✗ curl -X POST 'http://localhost:8983/solr/geonames2/tag?overlaps=NO_SUB&tagsLimit=5000&fl=id,name,countrycode&wt=json&indent=on' -H 'Content-Type:text/plain' -d 'Hello New York City'
      {
        "responseHeader":{...},
        "tagsCount":1,
        "tags":[[
            "startOffset",6,
            "endOffset",19,
            "ids",["5128581"]]],
        "response":{"numFound":1,"start":0,"docs":[
            {"id":"5128581", "name":["New York City"], "countrycode":["US"]}]
        }}
      

      Nothing inherent to /tag prevents it from handling multi-shard requests, it just wasn't a priority at the time the initial implementation was put in. We should add distributed support to this request handler.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gerlowskija Jason Gerlowski
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: