UIMA
  1. UIMA
  2. UIMA-1861

SnowballAnnotator needs refactoring

    Details

      Description

      SnowballAnnotator is extending the deprecated JTextAnnotator_ImplBase, have some unused imports and generics should be enabled.
      Moreover the initialize() method fails due to the AnnotatorContext object being null when run in a 2.3.1-SNAPSHOT distribution.

      1. SnowballAnnotatorPatch2.txt
        19 kB
        Marshall Schor
      2. UIMA1861-patch.txt
        9 kB
        Tommaso Teofili

        Activity

        Hide
        Marshall Schor added a comment -

        Thanks for the fixes/patch. Here are a few suggested changes, to take advantage of JCas better. (I've attached this version as patch2 above)

        1) Although this annotator is set up as a JCas annotator, it is missing the JCas type for TokenAnnotation. Because of this, it goes to some lengths to not make use of this type where it could be useful. To add the JCas cover types for this is easy: open the desc/SnowballAnnotator.xml descriptor in the Component Descriptor editor in Eclipse, click the typesystem page, and push the JCasGen button. This will generate the missing classes for the types and add them to the project.

        If the TokenAnnotation JCas type was available, the lines:

        (original)
        // iterate over all token annotations and add stem if available
        FSIterator tokenIterator = aJCas.getCas().getAnnotationIndex(this.tokenAnnotation).iterator();

        (with patch)

        // iterate over all token annotations and add stem if available
        FSIterator tokenIterator = aJCas.getAnnotationIndex(this.tokenAnnotation).iterator();
        // note: causes a warning leading to a suppress warnings, related to generics

        could be written
        // iterate over all token annotations and add stem if available
        FSIterator<TokenAnnotation> tokenIterator = (FSIterator<TokenAnnotation>)(FSIterator<?>) // very ugly "double-fisted cast"
        aJCas.getAnnotationIndex(TokenAnnotation.type).iterator();

        and the code in the bottom method (typeSystemInit) would not be needed. The "double-fisted cast" is described here http://markmail.org/message/w5kpympalj6tvqq3.

        Alternatively, to avoid the double cast, the FSIterator could be over the type Annotation, and an explicit cast of the next() could be done to TokenAnnotation:
        // iterate over all token annotations and add stem if available
        FSIterator<Annotation> tokenIterator = aJCas.getAnnotationIndex(TokenAnnotation.type).iterator();
        ...
        TokenAnnotation annot = (TokenAnnotation) tokenIterator.next();

        The line further on down which reads

        // get stemmer result and set annotation feature
        annot.setStringValue(this.tokenAnnotationStemmFeature, stemmer.getCurrent());

        would be better written (using JCas style) as:

        // get stemmer result and set annotation feature
        annot.setStem(stemmer.getCurrent());

        If the JCas style is used, the typeSystemInit method can be deleted, along with all the constants added to support it, because the things its computing are not used. In any case, it should not be called in the process method. (The UIMA framework calls it directly, but only when the type system changes).

        Show
        Marshall Schor added a comment - Thanks for the fixes/patch. Here are a few suggested changes, to take advantage of JCas better. (I've attached this version as patch2 above) 1) Although this annotator is set up as a JCas annotator, it is missing the JCas type for TokenAnnotation. Because of this, it goes to some lengths to not make use of this type where it could be useful. To add the JCas cover types for this is easy: open the desc/SnowballAnnotator.xml descriptor in the Component Descriptor editor in Eclipse, click the typesystem page, and push the JCasGen button. This will generate the missing classes for the types and add them to the project. If the TokenAnnotation JCas type was available, the lines: (original) // iterate over all token annotations and add stem if available FSIterator tokenIterator = aJCas.getCas().getAnnotationIndex(this.tokenAnnotation).iterator(); (with patch) // iterate over all token annotations and add stem if available FSIterator tokenIterator = aJCas.getAnnotationIndex(this.tokenAnnotation).iterator(); // note: causes a warning leading to a suppress warnings, related to generics could be written // iterate over all token annotations and add stem if available FSIterator<TokenAnnotation> tokenIterator = (FSIterator<TokenAnnotation>)(FSIterator<?>) // very ugly "double-fisted cast" aJCas.getAnnotationIndex(TokenAnnotation.type).iterator(); and the code in the bottom method (typeSystemInit) would not be needed. The "double-fisted cast" is described here http://markmail.org/message/w5kpympalj6tvqq3 . Alternatively, to avoid the double cast, the FSIterator could be over the type Annotation, and an explicit cast of the next() could be done to TokenAnnotation: // iterate over all token annotations and add stem if available FSIterator<Annotation> tokenIterator = aJCas.getAnnotationIndex(TokenAnnotation.type).iterator(); ... TokenAnnotation annot = (TokenAnnotation) tokenIterator.next(); The line further on down which reads // get stemmer result and set annotation feature annot.setStringValue(this.tokenAnnotationStemmFeature, stemmer.getCurrent()); would be better written (using JCas style) as: // get stemmer result and set annotation feature annot.setStem(stemmer.getCurrent()); If the JCas style is used, the typeSystemInit method can be deleted, along with all the constants added to support it, because the things its computing are not used. In any case, it should not be called in the process method. (The UIMA framework calls it directly, but only when the type system changes).
        Hide
        Marshall Schor added a comment -

        patch2

        Show
        Marshall Schor added a comment - patch2
        Hide
        Tommaso Teofili added a comment -

        Thanks Marshall, this patch looks good to me I agree with you that having typeSystem classes generated and retrieved from JCas sounds much better, I added Apache License headers to generated type system files and the testReconfigure() method inside SnowballAnnotatorTest to check that initialize() and reconfigure() work properly. Now committing the patch

        Show
        Tommaso Teofili added a comment - Thanks Marshall, this patch looks good to me I agree with you that having typeSystem classes generated and retrieved from JCas sounds much better, I added Apache License headers to generated type system files and the testReconfigure() method inside SnowballAnnotatorTest to check that initialize() and reconfigure() work properly. Now committing the patch
        Hide
        Tommaso Teofili added a comment -

        Fixed

        Show
        Tommaso Teofili added a comment - Fixed
        Hide
        Richard Eckart de Castilho added a comment -

        Reopening issue so it can be affected can be changed to a different version.

        Show
        Richard Eckart de Castilho added a comment - Reopening issue so it can be affected can be changed to a different version.
        Hide
        Richard Eckart de Castilho added a comment -

        Changing affects from 2.3.1 to 2.3.1Addons so version 2.3.1 can be removed.

        Show
        Richard Eckart de Castilho added a comment - Changing affects from 2.3.1 to 2.3.1Addons so version 2.3.1 can be removed.
        Hide
        Richard Eckart de Castilho added a comment -

        Setting to resolution state before reopening to change affects.

        Show
        Richard Eckart de Castilho added a comment - Setting to resolution state before reopening to change affects.

          People

          • Assignee:
            Tommaso Teofili
            Reporter:
            Tommaso Teofili
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development