Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1869

RemoveDuplicatesTokenFilter doest have expected behaviour

    XMLWordPrintableJSON

Details

    Description

      the RemoveDuplicatesTokenFilter seems broken as it initializes its map and attributes at the class level and not within its constructor
      in addition i would think the expected behaviour would be to remove identical terms with the same offset positions, instead it looks like it removes duplicates based on position increment which wont work when using it after something like the edgengram filter. when i posted this to the mailing list even erik hatcher seemed to think thats what this filter was supposed to do...

      attaching a patch that has the expected behaviour and initializes variables in constructor

      Attachments

        1. SOLR-1869.patch
          3 kB
          Joe Calderon
        2. RemoveDupOffsetTokenFilterFactory.java
          0.3 kB
          Joe Calderon
        3. RemoveDupOffsetTokenFilter.java
          3 kB
          Joe Calderon

        Activity

          People

            Unassigned Unassigned
            calderon.joe Joe Calderon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: