Solr
  1. Solr
  2. SOLR-2584

Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.1, 3.3, 4.0-ALPHA
    • Fix Version/s: 3.4, 4.0-ALPHA
    • Component/s: None
    • Labels:

      Description

      Hi folks,

      I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field.

      A typical use case is:

      If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation.

      What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.

      Regards

      1. SOLR-2584.patch
        9 kB
        Elmer Garduno
      2. SOLR-2584.patch
        10 kB
        Elmer Garduno
      3. SOLR-2584.patch
        11 kB
        Koji Sekiguchi

        Activity

        Robert Muir made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Robert Muir added a comment -

        bulk close for 3.4

        Show
        Robert Muir added a comment - bulk close for 3.4
        Koji Sekiguchi made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Koji Sekiguchi added a comment -

        committed in trunk and 3x.

        Show
        Koji Sekiguchi added a comment - committed in trunk and 3x.
        Hide
        Elmer Garduno added a comment -

        Thanks Koji

        Show
        Elmer Garduno added a comment - Thanks Koji
        Koji Sekiguchi made changes -
        Assignee Koji Sekiguchi [ koji ]
        Fix Version/s 3.4 [ 12316683 ]
        Fix Version/s 4.0 [ 12314992 ]
        Affects Version/s 1.4.1 [ 12315096 ]
        Koji Sekiguchi made changes -
        Attachment SOLR-2584.patch [ 12487099 ]
        Hide
        Koji Sekiguchi added a comment -

        Thanks Elmer for the patch!

        I did some fix in the attached patch:

        • remove unused checkNumDocs() from test
        • use <lst/> for fields parameter
        • use List instead of Set in processAdd() to keep the sequence of values in a multiValued field. I also added check code for it in the test case
        • as fields could be null, added null check in processAdd()
        • add prettify in javadoc
        Show
        Koji Sekiguchi added a comment - Thanks Elmer for the patch! I did some fix in the attached patch: remove unused checkNumDocs() from test use <lst/> for fields parameter use List instead of Set in processAdd() to keep the sequence of values in a multiValued field. I also added check code for it in the test case as fields could be null, added null check in processAdd() add prettify in javadoc
        Elmer Garduno made changes -
        Attachment SOLR-2584.patch [ 12487037 ]
        Hide
        Elmer Garduno added a comment -

        Added test cases and fixed an error.

        Show
        Elmer Garduno added a comment - Added test cases and fixed an error.
        Elmer Garduno made changes -
        Field Original Value New Value
        Attachment SOLR-2584.patch [ 12487031 ]
        Hide
        Elmer Garduno added a comment -

        UniqFieldsUpdateProcessor uniq's specified fields content. Useful after an UpdateRequestProcessor that could generate duplicate values for a field.

        Show
        Elmer Garduno added a comment - UniqFieldsUpdateProcessor uniq's specified fields content. Useful after an UpdateRequestProcessor that could generate duplicate values for a field.
        Hide
        Elmer Garduno added a comment -

        Koji, I followed your approach and implemented it using an UpdateRequestProcessor.

        I'm submitting the patch for branch 3x.

        Show
        Elmer Garduno added a comment - Koji, I followed your approach and implemented it using an UpdateRequestProcessor. I'm submitting the patch for branch 3x.
        Hide
        Koji Sekiguchi added a comment -

        Or we can implement the function in the new update processor and place it after uima update processor in the chain.

        Anyway I wish I could have the function.

        Show
        Koji Sekiguchi added a comment - Or we can implement the function in the new update processor and place it after uima update processor in the chain. Anyway I wish I could have the function.
        Elmer Garduno created issue -

          People

          • Assignee:
            Koji Sekiguchi
            Reporter:
            Elmer Garduno
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development