Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-15008

Avoid building OrdinalMap for each facet

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 8.7
    • None
    • Facet Module

    Description

      I'm running against the following scenario:

      • [JSON] faceting on a high cardinality field
      • few matching documents => few unique values

      Yet the query almost always takes a long time. Here's an example taking almost 4s for ~300 documents and unique values (edited a bit):

       

          "QTime":3869,
          "params":{
            "json":"{\"query\": \"*:*\",
            \"filter\": [\"type:test_type\", \"date:[1603670360 TO 1604361599]\", \"unique_id:49866\"]
            \"facet\": {\"keywords\":{\"type\":\"terms\",\"field\":\"keywords\",\"limit\":20,\"mincount\":20}}}",
            "rows":"0"}},
        "response":{"numFound":333,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[]
        },
        "facets":{
          "count":333,
          "keywords":{
            "buckets":[{
                "val":"value1",
                "count":124},
        ...
      

      I did some profiling with our Sematext Monitoring and it points me to OrdinalMap building (see attached screenshot). If I read the code right, an OrdinalMap is built with every facet. And it's expensive since there are many unique values in the shard (previously, there we more smaller shards, making latency better, but this approach doesn't scale for this particular use-case).

      If I'm right up to this point, I see a couple of potential improvements, inspired from Elasticsearch:

      1. Keep the OrdinalMap cached until the next softCommit, so that only the first query takes the penalty
      2. Allow faceting on actual values (a Map) rather than ordinals, for situations like the one above where we have few matching documents. We could potentially auto-detect this scenario (e.g. by configuring a threshold) and use a Map when there are few documents

      I'm curious about what you're thinking:

      • would a PR/patch be welcome for any of the two ideas above?
      • do you see better options? am I missing something?

       

      Attachments

        1. Screenshot 2020-11-19 at 12.01.55.png
          263 kB
          Radu Gheorghe
        2. writes_commits.png
          347 kB
          Radu Gheorghe

        Activity

          People

            Unassigned Unassigned
            radu0gheorghe Radu Gheorghe
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: