UIMA
  1. UIMA
  2. UIMA-1861

SnowballAnnotator needs refactoring

    Details

      Description

      SnowballAnnotator is extending the deprecated JTextAnnotator_ImplBase, have some unused imports and generics should be enabled.
      Moreover the initialize() method fails due to the AnnotatorContext object being null when run in a 2.3.1-SNAPSHOT distribution.

      1. SnowballAnnotatorPatch2.txt
        19 kB
        Marshall Schor
      2. UIMA1861-patch.txt
        9 kB
        Tommaso Teofili

        Activity

        Tommaso Teofili created issue -
        Tommaso Teofili made changes -
        Field Original Value New Value
        Attachment UIMA1861-patch.txt [ 12453350 ]
        Hide
        Marshall Schor added a comment -

        Thanks for the fixes/patch. Here are a few suggested changes, to take advantage of JCas better. (I've attached this version as patch2 above)

        1) Although this annotator is set up as a JCas annotator, it is missing the JCas type for TokenAnnotation. Because of this, it goes to some lengths to not make use of this type where it could be useful. To add the JCas cover types for this is easy: open the desc/SnowballAnnotator.xml descriptor in the Component Descriptor editor in Eclipse, click the typesystem page, and push the JCasGen button. This will generate the missing classes for the types and add them to the project.

        If the TokenAnnotation JCas type was available, the lines:

        (original)
        // iterate over all token annotations and add stem if available
        FSIterator tokenIterator = aJCas.getCas().getAnnotationIndex(this.tokenAnnotation).iterator();

        (with patch)

        // iterate over all token annotations and add stem if available
        FSIterator tokenIterator = aJCas.getAnnotationIndex(this.tokenAnnotation).iterator();
        // note: causes a warning leading to a suppress warnings, related to generics

        could be written
        // iterate over all token annotations and add stem if available
        FSIterator<TokenAnnotation> tokenIterator = (FSIterator<TokenAnnotation>)(FSIterator<?>) // very ugly "double-fisted cast"
        aJCas.getAnnotationIndex(TokenAnnotation.type).iterator();

        and the code in the bottom method (typeSystemInit) would not be needed. The "double-fisted cast" is described here http://markmail.org/message/w5kpympalj6tvqq3.

        Alternatively, to avoid the double cast, the FSIterator could be over the type Annotation, and an explicit cast of the next() could be done to TokenAnnotation:
        // iterate over all token annotations and add stem if available
        FSIterator<Annotation> tokenIterator = aJCas.getAnnotationIndex(TokenAnnotation.type).iterator();
        ...
        TokenAnnotation annot = (TokenAnnotation) tokenIterator.next();

        The line further on down which reads

        // get stemmer result and set annotation feature
        annot.setStringValue(this.tokenAnnotationStemmFeature, stemmer.getCurrent());

        would be better written (using JCas style) as:

        // get stemmer result and set annotation feature
        annot.setStem(stemmer.getCurrent());

        If the JCas style is used, the typeSystemInit method can be deleted, along with all the constants added to support it, because the things its computing are not used. In any case, it should not be called in the process method. (The UIMA framework calls it directly, but only when the type system changes).

        Show
        Marshall Schor added a comment - Thanks for the fixes/patch. Here are a few suggested changes, to take advantage of JCas better. (I've attached this version as patch2 above) 1) Although this annotator is set up as a JCas annotator, it is missing the JCas type for TokenAnnotation. Because of this, it goes to some lengths to not make use of this type where it could be useful. To add the JCas cover types for this is easy: open the desc/SnowballAnnotator.xml descriptor in the Component Descriptor editor in Eclipse, click the typesystem page, and push the JCasGen button. This will generate the missing classes for the types and add them to the project. If the TokenAnnotation JCas type was available, the lines: (original) // iterate over all token annotations and add stem if available FSIterator tokenIterator = aJCas.getCas().getAnnotationIndex(this.tokenAnnotation).iterator(); (with patch) // iterate over all token annotations and add stem if available FSIterator tokenIterator = aJCas.getAnnotationIndex(this.tokenAnnotation).iterator(); // note: causes a warning leading to a suppress warnings, related to generics could be written // iterate over all token annotations and add stem if available FSIterator<TokenAnnotation> tokenIterator = (FSIterator<TokenAnnotation>)(FSIterator<?>) // very ugly "double-fisted cast" aJCas.getAnnotationIndex(TokenAnnotation.type).iterator(); and the code in the bottom method (typeSystemInit) would not be needed. The "double-fisted cast" is described here http://markmail.org/message/w5kpympalj6tvqq3 . Alternatively, to avoid the double cast, the FSIterator could be over the type Annotation, and an explicit cast of the next() could be done to TokenAnnotation: // iterate over all token annotations and add stem if available FSIterator<Annotation> tokenIterator = aJCas.getAnnotationIndex(TokenAnnotation.type).iterator(); ... TokenAnnotation annot = (TokenAnnotation) tokenIterator.next(); The line further on down which reads // get stemmer result and set annotation feature annot.setStringValue(this.tokenAnnotationStemmFeature, stemmer.getCurrent()); would be better written (using JCas style) as: // get stemmer result and set annotation feature annot.setStem(stemmer.getCurrent()); If the JCas style is used, the typeSystemInit method can be deleted, along with all the constants added to support it, because the things its computing are not used. In any case, it should not be called in the process method. (The UIMA framework calls it directly, but only when the type system changes).
        Hide
        Marshall Schor added a comment -

        patch2

        Show
        Marshall Schor added a comment - patch2
        Marshall Schor made changes -
        Attachment SnowballAnnotatorPatch2.txt [ 12453717 ]
        Hide
        Tommaso Teofili added a comment -

        Thanks Marshall, this patch looks good to me I agree with you that having typeSystem classes generated and retrieved from JCas sounds much better, I added Apache License headers to generated type system files and the testReconfigure() method inside SnowballAnnotatorTest to check that initialize() and reconfigure() work properly. Now committing the patch

        Show
        Tommaso Teofili added a comment - Thanks Marshall, this patch looks good to me I agree with you that having typeSystem classes generated and retrieved from JCas sounds much better, I added Apache License headers to generated type system files and the testReconfigure() method inside SnowballAnnotatorTest to check that initialize() and reconfigure() work properly. Now committing the patch
        Hide
        Tommaso Teofili added a comment -

        Fixed

        Show
        Tommaso Teofili added a comment - Fixed
        Tommaso Teofili made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Marshall Schor made changes -
        Fix Version/s 2.3.1SDK [ 12315344 ]
        Fix Version/s 2.3.1 [ 12314751 ]
        Tommaso Teofili made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Tommaso Teofili made changes -
        Fix Version/s 2.3.1Addons [ 12316093 ]
        Fix Version/s 2.3.1SDK [ 12315344 ]
        Tommaso Teofili made changes -
        Status Reopened [ 4 ] Closed [ 6 ]
        Resolution Fixed [ 1 ]
        Hide
        Richard Eckart de Castilho added a comment -

        Reopening issue so it can be affected can be changed to a different version.

        Show
        Richard Eckart de Castilho added a comment - Reopening issue so it can be affected can be changed to a different version.
        Richard Eckart de Castilho made changes -
        Resolution Fixed [ 1 ]
        Status Closed [ 6 ] Reopened [ 4 ]
        Hide
        Richard Eckart de Castilho added a comment -

        Changing affects from 2.3.1 to 2.3.1Addons so version 2.3.1 can be removed.

        Show
        Richard Eckart de Castilho added a comment - Changing affects from 2.3.1 to 2.3.1Addons so version 2.3.1 can be removed.
        Richard Eckart de Castilho made changes -
        Affects Version/s 2.3.1Addons [ 12316093 ]
        Affects Version/s 2.3.1 [ 12314751 ]
        Hide
        Richard Eckart de Castilho added a comment -

        Setting to resolution state before reopening to change affects.

        Show
        Richard Eckart de Castilho added a comment - Setting to resolution state before reopening to change affects.
        Richard Eckart de Castilho made changes -
        Status Reopened [ 4 ] Closed [ 6 ]
        Resolution Fixed [ 1 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        5d 18h 49m 1 Tommaso Teofili 03/Sep/10 06:35
        Resolved Resolved Reopened Reopened
        215d 4h 33m 1 Tommaso Teofili 06/Apr/11 11:08
        Closed Closed Reopened Reopened
        784d 9h 58m 1 Richard Eckart de Castilho 29/May/13 21:16
        Reopened Reopened Closed Closed
        15m 38s 2 Richard Eckart de Castilho 29/May/13 21:22

          People

          • Assignee:
            Tommaso Teofili
            Reporter:
            Tommaso Teofili
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development