Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-1520

Generated Java code for stemmers is broken, and should be re-generated

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.3.0, 2.3.1
    • 2.3.2
    • Stemmer
    • None

    Description

      The recursive stemming, which seems hard to actually trigger, but which is the intended usage of the methodObject and method in the Among class (called reflectively) is completely broken. First off, it tries to invoke a private method from outside the class (from a parent class, the SnowballProgram), which fails with an illegal access exception; if that worked, it would also have invoked all such method calls on the same, shared, static object—not on the relevant stemmer instance. 
      This was fixed 8 years ago, but it looks like the generated code in the opennlp-tools is 10 years old. I would urge you to re-generate that code. 
       
      Commit that fixed the Java code generation: https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75
       
      Relevant sample stemmer with broken Java:
      https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java
       
      Stack trace showing illegal reflection access:
       

      2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private" 
      exception=java.lang.IllegalAccessException: class opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
        at java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392) 
        at java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674) 
        at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
        at opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353) 
        at opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480) 
        at opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
        at opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
        at com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64) 
        at com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54) 
        at com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
        at com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54) 
        at com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
      ...

       
       
      Best, Jon Marius Venstad, developer at vespa.ai

      Attachments

        Activity

          People

            rzo1 Richard Zowalla
            jonmv Jon Marius Venstad
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: