Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8380

UTF8TaxonomyWriterCache inconsistency

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 7.1
    • Fix Version/s: 7.5
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I’m facing a problem with taxonomy writer cache inconsistency. At some point in time UTF8TaxonomyWriterCache starts to return wrong ord for some facet labels. As result wrong ord are written in doc facet fields, and wrong counts are returned (undercount) during search. This bug is manifested on different servers with different index contents (we have several separate indexes with unique data).
      Unfortunately I can’t reproduce this behaviour in tests. 
      I've dumped "broken" UTF8TaxonomyWriterCache instance and created app to load it and to compare with real taxonomy. Dumps and app are in attachment. To run demo extract archives content and exec:

      mvn compile
      mvn exec:java -Dexec.mainClass="me.torobaev.lucene.taxonomy.cache.TaxonomyCacheCheck" -DtaxonomyDir=../taxonomy/ -DcacheDump=../taxonomy-cache.json
      

      As you can see, labels [frametype, 7] and [modification_id, 682] have same ord in cache.

        Attachments

        1. LUCENE-8380.patch
          5 kB
          Dawid Weiss
        2. lucene-taxonomy-cache-report.tar.gz
          4 kB
          Ruslan Torobaev
        3. taxonomy.tar.gz
          1.28 MB
          Ruslan Torobaev
        4. taxonomy-cache.json.gz
          1.25 MB
          Ruslan Torobaev

          Activity

            People

            • Assignee:
              dweiss Dawid Weiss
              Reporter:
              ruslan.torobaev Ruslan Torobaev
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: