Solr
  1. Solr
  2. SOLR-711

SimpleFacets: Performance Boost for Tokenized Fields for smaller DocSet using Term Vectors

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Component/s: search
    • Labels:
      None

      Description

      From http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html:

      Scenario:

      • 10,000,000 documents in the index;
      • 5-10 terms per document;
      • 200,000 unique terms for a tokenized field.

      Obviously calculating sizes of 200,000 intersections with FilterCache is 100 times slower than traversing 10 - 20,000 documents for smaller DocSets and counting frequencies of Terms.

      Not applicable if size of DocSet is close to total number of unique tokens (200,000 in our scenario).

      See SimpleFacets.java:

      public NamedList getFacetTermEnumCounts(
        SolrIndexSearcher searcher, 
        DocSet docs, ...
      

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Fuad Efendi
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1,680h
              1,680h
              Remaining:
              Remaining Estimate - 1,680h
              1,680h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development