Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
5.5
-
None
-
None
-
None
Description
Hi,
we are currently facing the issue that some calculated values of the TV component are obviously wrong with enabled
ExactStatsCache. --> shard-wide TV docfreq calculation
This problem is subsequent to
SOLR-8459 NPE using TermVectorComponent in combinition with ExactStatsCache
Maybe the problem is very trivial and we configured something wrong ...
So lets go deeper into that problem:
1) The problem in summary:
==================
We are requesting with enabled "tv.df", "tv.tf" and "tv.tf_idf" -->
tv.df=true&tv.tf_idf=true&tv.tf=true
additionally for debugging purposes we are requesting by calling
termfreq("site_term_maincontent","abakus"),docfreq("site_maincontent_term_wdf","abakus"),ttf("site_maincontent_term_wdf","abakus")
Our findings are:
- the tv.tf as well as the termfreq seems to be correct
- the tv.df as well as the docfreq is obviously wrong
- the tv.tf_idf as well as ttf is wrong as well, I guess as subsequent fault of the tv.df (docfeq)
2) What we have:
===========
schema.xml:
... <field name="site_maincontent_term_wdf" type="text_token_wdf" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true"/> ... <fieldType name="text_token_wdf" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/> </analyzer> </fieldType> ...
solrconfig.xml:
... <statsCache class="org.apache.solr.search.stats.ExactStatsCache"/> ... <searchComponent name="tvComponent" class="org.apache.solr.handler.component.TermVectorComponent"/> <requestHandler name="/tvrh" class="org.apache.solr.handler.component.SearchHandler"> <lst name="defaults"> <bool name="tv">true</bool> </lst> <arr name="last-components"> <str>tvComponent</str> </arr> </requestHandler> ...
You can find out any details here:
http://149.202.5.192:8820/solr/#/SingleDomainSite_34_shard1_replica1
3) Examples
========
If you are calling this link you can see that there are 6 existent documents containing the word "abakus" in the field "site_maincontent_term_wdf" ...
But if you are looking into the field "docfreq" in the output documents, it is incorrect and always different (sould be always the same ...).
"docfreq(field,term) returns the number of documents that contain the term in the field. This is a constant (the same value for all documents in the index)."
Here is a link with enabled shards.info:
http://149.202.5.192:8820/solr/SingleDomainSite_34_shard1_replica1/tvrh?&wt=xml&q=site_maincontent_term_wdf%3Aabakus&start=0&rows=10&fl=ttf%28site_maincontent_term_wdf%2C%27abakus%27%29%2Cdocfreq%28site_maincontent_term_wdf%2C%27abakus%27%29%2Cidf%28site_maincontent_term_wdf%2C%27abakus%27%29%2Csite_url&shards.qt=/tvrh&shards.info=true
Attachments
Issue Links
- relates to
-
SOLR-8459 NPE using TermVectorComponent in combinition with ExactStatsCache
- Resolved