Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4824

Fuzzy / Faceting results are changed after ingestion of documents past a certain number

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.2, 4.3
    • None
    • search
    • None
    • Ubuntu 12.04 LTS 12.04.2
      jre1.7.0_17
      jboss-as-7.1.1.Final

    Description

      In upgrading from SOLR 3.6 to 4.2/4.3 and comparing results on fuzzy queries, I found that after a certain number of documents were ingested the fuzzy query had drastically lower number of results. We have approximately 18,000 documents per day and after ingesting approximately 40 days of documents, the next incremental day of documents results in a lower number of results of a fuzzy search.

      The query : http://10.100.1.xx:8080/solr/corex/select?q=cc:worde~1&facet=on&facet.field=date&fl=date&facet.sort

      produces the following result before the threshold is crossed

      <response><lst name="responseHeader">
      <int name="status">0</int><int name="QTime">2349</int><lst name="params"><str name="facet">on</str><str name="fl">date</str><str name="facet.sort"/>
      <str name="q">cc:worde~1</str><str name="facet.field">date</str></lst></lst><result name="response" numFound="362803" start="0"></result>
      <lst name="facet_counts"><lst name="facet_queries"/><lst name="facet_fields"><lst name="date">
      <int name="2012-12-31">2866</int>
      <int name="2013-01-01">11372</int>
      <int name="2013-01-02">11514</int>
      <int name="2013-01-03">12015</int>
      <int name="2013-01-04">11746</int>
      <int name="2013-01-05">10853</int>
      <int name="2013-01-06">11053</int>
      <int name="2013-01-07">11815</int>
      <int name="2013-01-08">11427</int>
      <int name="2013-01-09">11475</int>
      <int name="2013-01-10">11461</int>
      <int name="2013-01-11">12058</int>
      <int name="2013-01-12">11335</int>
      <int name="2013-01-13">12039</int>
      <int name="2013-01-14">12064</int>
      <int name="2013-01-15">12234</int>
      <int name="2013-01-16">12545</int>
      <int name="2013-01-17">11766</int>
      <int name="2013-01-18">12197</int>
      <int name="2013-01-19">11414</int>
      <int name="2013-01-20">11633</int>
      <int name="2013-01-21">12863</int>
      <int name="2013-01-22">12378</int>
      <int name="2013-01-23">11947</int>
      <int name="2013-01-24">11822</int>
      <int name="2013-01-25">11882</int>
      <int name="2013-01-26">10474</int>
      <int name="2013-01-27">11051</int>
      <int name="2013-01-28">11776</int>
      <int name="2013-01-29">11957</int>
      <int name="2013-01-30">11260</int>
      <int name="2013-01-31">8511</int>
      </lst></lst><lst name="facet_dates"/><lst name="facet_ranges"/></lst></response>

      Once the 40 days of documents ingested threshold is crossed the results drop as show below for the same query

      <response><lst name="responseHeader">
      <int name="status">0</int><int name="QTime">2</int><lst name="params"><str name="facet">on</str><str name="fl">date</str><str name="facet.sort"/><str name="q">cc:worde~1</str><str name="facet.field">date</str></lst></lst>
      <result name="response" numFound="1338" start="0"></result>
      <lst name="facet_counts"><lst name="facet_queries"/><lst name="facet_fields"><lst name="date">
      <int name="2012-12-31">0</int>
      <int name="2013-01-01">41</int>
      <int name="2013-01-02">21</int>
      <int name="2013-01-03">24</int>
      <int name="2013-01-04">19</int>
      <int name="2013-01-05">9</int>
      <int name="2013-01-06">11</int>
      <int name="2013-01-07">17</int>
      <int name="2013-01-08">14</int>
      <int name="2013-01-09">24</int>
      <int name="2013-01-10">43</int>
      <int name="2013-01-11">14</int>
      <int name="2013-01-12">52</int>
      <int name="2013-01-13">57</int>
      <int name="2013-01-14">25</int>
      <int name="2013-01-15">17</int>
      <int name="2013-01-16">34</int>
      <int name="2013-01-17">11</int>
      <int name="2013-01-18">16</int>
      <int name="2013-01-19">121</int>
      <int name="2013-01-20">33</int>
      <int name="2013-01-21">26</int>
      <int name="2013-01-22">59</int>
      <int name="2013-01-23">27</int>
      <int name="2013-01-24">10</int>
      <int name="2013-01-25">9</int>
      <int name="2013-01-26">6</int>
      <int name="2013-01-27">16</int>
      <int name="2013-01-28">11</int>
      <int name="2013-01-29">15</int>
      <int name="2013-01-30">21</int>
      <int name="2013-01-31">109</int>
      <int name="2013-02-01">11</int>
      <int name="2013-02-02">7</int>
      <int name="2013-02-03">10</int>
      <int name="2013-02-04">8</int>
      <int name="2013-02-05">13</int>
      <int name="2013-02-06">75</int>
      <int name="2013-02-07">77</int>
      <int name="2013-02-08">31</int>
      <int name="2013-02-09">35</int>
      <int name="2013-02-10">22</int>
      <int name="2013-02-11">18</int>
      <int name="2013-02-12">11</int>
      <int name="2013-02-13">68</int>
      <int name="2013-02-14">40</int>
      </lst></lst><lst name="facet_dates"/><lst name="facet_ranges"/></lst></response>

      I have also tested this with different months of data and have seen the same issue around the number of documents.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lvenkataswamy Lakshmi Venkataswamy
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: