Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-711

SimpleFacets: Performance Boost for Tokenized Fields for smaller DocSet using Term Vectors

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Component/s: search
    • Labels:
      None

      Description

      From http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html:

      Scenario:

      • 10,000,000 documents in the index;
      • 5-10 terms per document;
      • 200,000 unique terms for a tokenized field.

      Obviously calculating sizes of 200,000 intersections with FilterCache is 100 times slower than traversing 10 - 20,000 documents for smaller DocSets and counting frequencies of Terms.

      Not applicable if size of DocSet is close to total number of unique tokens (200,000 in our scenario).

      See SimpleFacets.java:

      public NamedList getFacetTermEnumCounts(
        SolrIndexSearcher searcher, 
        DocSet docs, ...
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              funtick Fuad Efendi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1,680h
                1,680h
                Remaining:
                Remaining Estimate - 1,680h
                1,680h
                Logged:
                Time Spent - Not Specified
                Not Specified