Solr
  1. Solr
  2. SOLR-2584

Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.1, 3.3, 4.0-ALPHA
    • Fix Version/s: 3.4, 4.0-ALPHA
    • Component/s: None
    • Labels:

      Description

      Hi folks,

      I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field.

      A typical use case is:

      If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation.

      What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.

      Regards

      1. SOLR-2584.patch
        11 kB
        Koji Sekiguchi
      2. SOLR-2584.patch
        10 kB
        Elmer Garduno
      3. SOLR-2584.patch
        9 kB
        Elmer Garduno

        Activity

        Hide
        Koji Sekiguchi added a comment -

        Or we can implement the function in the new update processor and place it after uima update processor in the chain.

        Anyway I wish I could have the function.

        Show
        Koji Sekiguchi added a comment - Or we can implement the function in the new update processor and place it after uima update processor in the chain. Anyway I wish I could have the function.
        Hide
        Elmer Garduno added a comment -

        Koji, I followed your approach and implemented it using an UpdateRequestProcessor.

        I'm submitting the patch for branch 3x.

        Show
        Elmer Garduno added a comment - Koji, I followed your approach and implemented it using an UpdateRequestProcessor. I'm submitting the patch for branch 3x.
        Hide
        Elmer Garduno added a comment -

        UniqFieldsUpdateProcessor uniq's specified fields content. Useful after an UpdateRequestProcessor that could generate duplicate values for a field.

        Show
        Elmer Garduno added a comment - UniqFieldsUpdateProcessor uniq's specified fields content. Useful after an UpdateRequestProcessor that could generate duplicate values for a field.
        Hide
        Elmer Garduno added a comment -

        Added test cases and fixed an error.

        Show
        Elmer Garduno added a comment - Added test cases and fixed an error.
        Hide
        Koji Sekiguchi added a comment -

        Thanks Elmer for the patch!

        I did some fix in the attached patch:

        • remove unused checkNumDocs() from test
        • use <lst/> for fields parameter
        • use List instead of Set in processAdd() to keep the sequence of values in a multiValued field. I also added check code for it in the test case
        • as fields could be null, added null check in processAdd()
        • add prettify in javadoc
        Show
        Koji Sekiguchi added a comment - Thanks Elmer for the patch! I did some fix in the attached patch: remove unused checkNumDocs() from test use <lst/> for fields parameter use List instead of Set in processAdd() to keep the sequence of values in a multiValued field. I also added check code for it in the test case as fields could be null, added null check in processAdd() add prettify in javadoc
        Hide
        Elmer Garduno added a comment -

        Thanks Koji

        Show
        Elmer Garduno added a comment - Thanks Koji
        Hide
        Koji Sekiguchi added a comment -

        committed in trunk and 3x.

        Show
        Koji Sekiguchi added a comment - committed in trunk and 3x.
        Hide
        Robert Muir added a comment -

        bulk close for 3.4

        Show
        Robert Muir added a comment - bulk close for 3.4

          People

          • Assignee:
            Koji Sekiguchi
            Reporter:
            Elmer Garduno
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development