Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-1309

NameFinderME - Unexpected result using unchanged training data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Bug
    • 1.9.2
    • None
    • Name Finder
    • None

    Description

       

      Hello,

      Based on NameFinderMETest.java / function testNameFinder(), I have written a simple test code and changed the test sentence
      from (1):

      String[] sentence = {"Alisa",
       "appreciated",
       "the",
       "hint",
       "and",
       "enjoyed",
       "a",
       "delicious",
       "traditional",
       "meal."};
      

      to (2):

      String[] sentence = {"Alisa",
       "and",
       "Mike",
       "appreciated",
       "the",
       "hint",
       "and",
       "enjoyed",
       "a",
       "delicious",
       "traditional",
       "meal."};
      

      (Just added "and Mike") and expected to get 2 results (two names Alisa and Mike) because both names are annotated in the training data. I just get 1 result (Mike) for (2). I used the training data file AnnotatedSentences.txt  (unchanged).

      Can anyone tell me what's wrong? Thanks.

      Test code:

       

      String trainingDatafilePath = "opennlp/tools/namefind/AnnotatedSentences.txt";
      String encoding = "ISO-8859-1";
       ObjectStream<NameSample> sampleStream = new NameSampleDataStream(new PlainTextByLineStream(new MarkableFileInputStreamFactory(new File(trainingDatafilePath+"AnnotatedSentences.txt")), encoding));
       
       TrainingParameters params = new TrainingParameters();
       params.put(TrainingParameters.ITERATIONS_PARAM, 70);
       params.put(TrainingParameters.CUTOFF_PARAM, 1);
      TokenNameFinderModel nameFinderModel = NameFinderME.train("eng", null, sampleStream,
       params, TokenNameFinderFactory.create(null, null, Collections.emptyMap(), new BioCodec()));
      TokenNameFinder nameFinder = new NameFinderME(nameFinderModel);
      // now test if it can detect the sample sentences
       String[] sentence = {"Alisa",
       "and",
       "Mike",
       "appreciated",
       "the",
       "hint",
       "and",
       "enjoyed",
       "a",
       "delicious",
       "traditional",
       "meal."};
      Span[] names = nameFinder.find(sentence);
       if (names != null && names.length != 0) {
       System.out.println(" > Found ["+names.length+"] results");
       for(Span name : names){
       String personName="";
       for(int i=name.getStart(); i<name.getEnd(); i++){
       personName+=sentence[i]+" ";
       }
       System.out.println(" > Result "+1+": Type: ["+name.getType()+"] : PersonName: ["+personName+"]\t [probability="+name.getProb()+"]");
       }
       } else {
       System.out.println(" > No results found");
       }
      

       

       

      Result for (1):

      Indexing events with TwoPass using cutoff of 1
      Computing event counts... done. 1392 events
      Indexing... done.
      Collecting events... Done indexing in 0.22 s.
      Incorporating indexed data for training...
      done.
      Number of Event Tokens: 1392
      Number of Outcomes: 3
      Number of Predicates: 9164
      Computing model parameters...
      Performing 70 iterations.
      1: . (1355/1392) 0.9734195402298851
      2: . (1383/1392) 0.9935344827586207
      3: . (1390/1392) 0.9985632183908046
      4: . (1390/1392) 0.9985632183908046
      5: . (1391/1392) 0.9992816091954023
      6: . (1392/1392) 1.0
      7: . (1392/1392) 1.0
      8: . (1392/1392) 1.0
      9: . (1392/1392) 1.0
      Stopping: change in training set accuracy less than 1.0E-5
      Stats: (1392/1392) 1.0
      ...done.

      Found [1] results
      Result 1: Type: [default] : PersonName: [Alisa ] [probability=0.5483001511243855]

       

      Result for (2):

      Indexing events with TwoPass using cutoff of 1
      Computing event counts... done. 1392 events
      Indexing... done.
      Collecting events... Done indexing in 0.22 s.
      Incorporating indexed data for training...
      done.
      Number of Event Tokens: 1392
      Number of Outcomes: 3
      Number of Predicates: 9164
      Computing model parameters...
      Performing 70 iterations.
      1: . (1355/1392) 0.9734195402298851
      2: . (1383/1392) 0.9935344827586207
      3: . (1390/1392) 0.9985632183908046
      4: . (1390/1392) 0.9985632183908046
      5: . (1391/1392) 0.9992816091954023
      6: . (1392/1392) 1.0
      7: . (1392/1392) 1.0
      8: . (1392/1392) 1.0
      9: . (1392/1392) 1.0
      Stopping: change in training set accuracy less than 1.0E-5
      Stats: (1392/1392) 1.0
      ...done.

      Found [1] results
      Result 1: Type: [default] : PersonName: [Mike ] [probability=0.460685209028902]

      Attachments

        Activity

          People

            Unassigned Unassigned
            micha2017 Michael
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: