Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14293

Payloads Are Written or Read Incorrectly - Across the Documents

    XMLWordPrintableJSON

Details

    Description

      I noticed a weird payload behavior with Solr 6.3.0, also 7.7.2 and 8.3.1.  After writing the Lucene62Codec specific unit test (see attached, also can be run with the later versions) I think there could be a bug which allows for the same term payloads to be written into another document's same term payload (or the second payload for the second document not being read correctly).  
       
      For comparison, I added SimpleTextCodec which doesn't behave this way. 
       
      For 8.3.1, you will need to change MultiFields.getTermPositionsEnum(...) to MultiTerms.getTermPostingsEnum(...).
       
      Thanks to Alan Woodward, I made the necessary changes to the analyzer to address the sharing of the TokenStreamComponents which was used in the TestPayloads class.  Now I use non-mocked tokenizer and a new filter which would create a random payload (see attached).  So, doc one and two will have the same token, but different payloads.  

      Same idea, SimpleTextCodec passes the test, but these ones don't:

      Lucene50Codec;
      Lucene54Codec;
      Lucene62Codec;
      Lucene70Codec;
      Lucene80Codec; 
       
       

      Attachments

        1. TestPayloads.java
          7 kB
          Ivan Provalov

        Activity

          People

            Unassigned Unassigned
            iprovalo Ivan Provalov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: