
| Key: |
SOLR-711
|
| Type: |
Improvement
|
| Status: |
Closed
|
| Resolution: |
Fixed
|
| Priority: |
Major
|
| Assignee: |
Unassigned
|
| Reporter: |
Fuad Efendi
|
| Votes: |
0
|
| Watchers: |
2
|
|
If you were logged in you would be able to see more operations.
|
|
|
|
Time Tracking:
|
|
Original Estimate:
|
1680h
|
|
|
Remaining Estimate:
|
1680h
|
|
|
Time Spent:
|
Not Specified
|
|
|
|
| Resolution Date: |
17/Dec/08 04:48 PM
|
From http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html :
Scenario:
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.
Obviously calculating sizes of 200,000 intersections with FilterCache is 100 times slower than traversing 10 - 20,000 documents for smaller DocSets and counting frequencies of Terms.
Not applicable if size of DocSet is close to total number of unique tokens (200,000 in our scenario).
See SimpleFacets.java:
public NamedList getFacetTermEnumCounts(
SolrIndexSearcher searcher,
DocSet docs, ...
|
|
Description
|
From http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html :
Scenario:
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.
Obviously calculating sizes of 200,000 intersections with FilterCache is 100 times slower than traversing 10 - 20,000 documents for smaller DocSets and counting frequencies of Terms.
Not applicable if size of DocSet is close to total number of unique tokens (200,000 in our scenario).
See SimpleFacets.java:
public NamedList getFacetTermEnumCounts(
SolrIndexSearcher searcher,
DocSet docs, ...
|
Show » |
made changes - 19/Aug/08 08:01 PM
| Field |
Original Value |
New Value |
|
Description
|
From [url]http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html[/url]:
Scenario:
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.
_Obviously calculating sizes of 200,000 intersections with FilterCache is 100 times slower than traversing 10 - 20,000 documents for smaller DocSets and counting frequencies of Terms._
Not applicable if size of DocSet is close to total number of unique tokens (200,000 in our scenario).
See SimpleFacets:
{{
public NamedList getFacetTermEnumCounts(
SolrIndexSearcher searcher,
DocSet docs,
String field,
int offset,
int limit,
int mincount,
boolean missing,
boolean sort,
String prefix)
throws IOException {...}
}}
|
From [http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html]:
Scenario:
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.
_Obviously calculating sizes of 200,000 intersections with FilterCache is 100 times slower than traversing 10 - 20,000 documents for smaller DocSets and counting frequencies of Terms._
Not applicable if size of DocSet is close to total number of unique tokens (200,000 in our scenario).
See SimpleFacets:
{code:title=SimpleFacets.java|borderStyle=solid}
public NamedList getFacetTermEnumCounts(
SolrIndexSearcher searcher,
DocSet docs, ...
{code}
|
made changes - 19/Aug/08 08:01 PM
|
Comment
|
[ trivial formatting
]
|
|
made changes - 19/Aug/08 08:02 PM
|
Description
|
From [http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html]:
Scenario:
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.
_Obviously calculating sizes of 200,000 intersections with FilterCache is 100 times slower than traversing 10 - 20,000 documents for smaller DocSets and counting frequencies of Terms._
Not applicable if size of DocSet is close to total number of unique tokens (200,000 in our scenario).
See SimpleFacets:
{code:title=SimpleFacets.java|borderStyle=solid}
public NamedList getFacetTermEnumCounts(
SolrIndexSearcher searcher,
DocSet docs, ...
{code}
|
From [http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html]:
Scenario:
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.
_Obviously calculating sizes of 200,000 intersections with FilterCache is 100 times slower than traversing 10 - 20,000 documents for smaller DocSets and counting frequencies of Terms._
Not applicable if size of DocSet is close to total number of unique tokens (200,000 in our scenario).
See SimpleFacets.java:
{code}
public NamedList getFacetTermEnumCounts(
SolrIndexSearcher searcher,
DocSet docs, ...
{code}
|
made changes - 17/Dec/08 04:48 PM
|
Status
|
Open
[ 1
]
|
Closed
[ 6
]
|
|
Resolution
|
|
Fixed
[ 1
]
|
|