Solr
  1. Solr
  2. SOLR-669

SOLR currently does not support caching for (Query, FacetFieldList)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      It is huge performance bottleneck and it describes huge difference between qtime and SolrJ's elapsedTime. I quickly browsed SolrIndexSearcher: it caches only (Key, DocSet/DocList <Lucene Ids>) key-value pairs and it does not have cache for (Query, FacetFieldList).
      filterCache stores DocList for each 'filter' and is used for constant recalculations...

      This would be significant performance improvement.

        Activity

        Hide
        Fuad Efendi added a comment -

        This piece of code in SimpleFacets:

            if (sf.multiValued() || ft.isTokenized() || ft instanceof BoolField) {
              // Always use filters for booleans... we know the number of values is very small.
              counts = getFacetTermEnumCounts(searcher, docs, field, offset, limit, mincount,missing,sort,prefix);
            } else {
              // TODO: future logic could use filters instead of the fieldcache if
              // the number of terms in the field is small enough.
              counts = getFieldCacheCounts(searcher, docs, field, offset,limit, mincount, missing, sort, prefix);
            }
        
        • optimization for single-valued non-tokenized... 'Lucene FieldCache to get counts for each unique field value in docs'

        We should implement additional caching to support this the FilterCache to get the intersection; FilterCache stores DocSet only and does not store NamedList of field-intersections:

            /**
           * Returns a list of terms in the specified field along with the 
           * corresponding count of documents in the set that match that constraint.
           * This method uses the FilterCache to get the intersection count between <code>docs</code>
           * and the DocSet for each term in the filter.
           *
           * @see FacetParams#FACET_LIMIT
           * @see FacetParams#FACET_ZEROS
           * @see FacetParams#FACET_MISSING
           */
          public NamedList getFacetTermEnumCounts(SolrIndexSearcher searcher, DocSet docs, String field, int offset, int limit, int mincount, boolean missing, boolean sort, String prefix)
            throws IOException {
        ...
        }
        
        Show
        Fuad Efendi added a comment - This piece of code in SimpleFacets: if (sf.multiValued() || ft.isTokenized() || ft instanceof BoolField) { // Always use filters for booleans... we know the number of values is very small. counts = getFacetTermEnumCounts(searcher, docs, field, offset, limit, mincount,missing,sort,prefix); } else { // TODO: future logic could use filters instead of the fieldcache if // the number of terms in the field is small enough. counts = getFieldCacheCounts(searcher, docs, field, offset,limit, mincount, missing, sort, prefix); } optimization for single-valued non-tokenized... 'Lucene FieldCache to get counts for each unique field value in docs' We should implement additional caching to support this the FilterCache to get the intersection ; FilterCache stores DocSet only and does not store NamedList of field-intersections: /** * Returns a list of terms in the specified field along with the * corresponding count of documents in the set that match that constraint. * This method uses the FilterCache to get the intersection count between <code>docs</code> * and the DocSet for each term in the filter. * * @see FacetParams#FACET_LIMIT * @see FacetParams#FACET_ZEROS * @see FacetParams#FACET_MISSING */ public NamedList getFacetTermEnumCounts(SolrIndexSearcher searcher, DocSet docs, String field, int offset, int limit, int mincount, boolean missing, boolean sort, String prefix) throws IOException { ... }
        Hide
        Fuad Efendi added a comment -

        To confirm:

        • SOLR uses Lucene internals (with caching) only if field is non-tokenized single-valued non-boolean, and SOLR does not have own cache to store calculated intersections (faceting).
        Show
        Fuad Efendi added a comment - To confirm: SOLR uses Lucene internals (with caching) only if field is non-tokenized single-valued non-boolean, and SOLR does not have own cache to store calculated intersections ( faceting ).
        Hide
        Jan Høydahl added a comment -

        Closing old issue, please re-open if necessary.

        Show
        Jan Høydahl added a comment - Closing old issue, please re-open if necessary.
        Hide
        Gunnar Wagenknecht added a comment -

        This bug is closed as duplicate but I can't actually see a link to the other issue this one duplicates. It would be nice if such a link can be added.

        Show
        Gunnar Wagenknecht added a comment - This bug is closed as duplicate but I can't actually see a link to the other issue this one duplicates. It would be nice if such a link can be added.
        Hide
        Jan Høydahl added a comment -

        Changed resolution state to "Won't fix". It appears this is not a feature anyone finds useful enough to even comment on, far less contribute to for almost 5 years, so to me that's a theoretical need, not a real one. Please re-open if you (or anyone else) want to see this solved.

        Show
        Jan Høydahl added a comment - Changed resolution state to "Won't fix". It appears this is not a feature anyone finds useful enough to even comment on, far less contribute to for almost 5 years, so to me that's a theoretical need, not a real one. Please re-open if you (or anyone else) want to see this solved.

          People

          • Assignee:
            Unassigned
            Reporter:
            Fuad Efendi
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1,680h
              1,680h
              Remaining:
              Remaining Estimate - 1,680h
              1,680h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development