[OPENNLP-1309] NameFinderME - Unexpected result using unchanged training data - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Not A Bug
Affects Version/s: 1.9.2
Fix Version/s: None
Component/s: Name Finder
Labels:
None

Description

Hello,

Based on NameFinderMETest.java / function testNameFinder(), I have written a simple test code and changed the test sentence
from (1):

String[] sentence = {"Alisa",
 "appreciated",
 "the",
 "hint",
 "and",
 "enjoyed",
 "a",
 "delicious",
 "traditional",
 "meal."};

to (2):

String[] sentence = {"Alisa",
 "and",
 "Mike",
 "appreciated",
 "the",
 "hint",
 "and",
 "enjoyed",
 "a",
 "delicious",
 "traditional",
 "meal."};

(Just added "and Mike") and expected to get 2 results (two names Alisa and Mike) because both names are annotated in the training data. I just get 1 result (Mike) for (2). I used the training data file AnnotatedSentences.txt (unchanged).

Can anyone tell me what's wrong? Thanks.

Test code:

String trainingDatafilePath = "opennlp/tools/namefind/AnnotatedSentences.txt";
String encoding = "ISO-8859-1";
 ObjectStream<NameSample> sampleStream = new NameSampleDataStream(new PlainTextByLineStream(new MarkableFileInputStreamFactory(new File(trainingDatafilePath+"AnnotatedSentences.txt")), encoding));
 
 TrainingParameters params = new TrainingParameters();
 params.put(TrainingParameters.ITERATIONS_PARAM, 70);
 params.put(TrainingParameters.CUTOFF_PARAM, 1);
TokenNameFinderModel nameFinderModel = NameFinderME.train("eng", null, sampleStream,
 params, TokenNameFinderFactory.create(null, null, Collections.emptyMap(), new BioCodec()));
TokenNameFinder nameFinder = new NameFinderME(nameFinderModel);
// now test if it can detect the sample sentences
 String[] sentence = {"Alisa",
 "and",
 "Mike",
 "appreciated",
 "the",
 "hint",
 "and",
 "enjoyed",
 "a",
 "delicious",
 "traditional",
 "meal."};
Span[] names = nameFinder.find(sentence);
 if (names != null && names.length != 0) {
 System.out.println(" > Found ["+names.length+"] results");
 for(Span name : names){
 String personName="";
 for(int i=name.getStart(); i<name.getEnd(); i++){
 personName+=sentence[i]+" ";
 }
 System.out.println(" > Result "+1+": Type: ["+name.getType()+"] : PersonName: ["+personName+"]\t [probability="+name.getProb()+"]");
 }
 } else {
 System.out.println(" > No results found");
 }

Result for (1):

Indexing events with TwoPass using cutoff of 1
Computing event counts... done. 1392 events
Indexing... done.
Collecting events... Done indexing in 0.22 s.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1392
Number of Outcomes: 3
Number of Predicates: 9164
Computing model parameters...
Performing 70 iterations.
1: . (1355/1392) 0.9734195402298851
2: . (1383/1392) 0.9935344827586207
3: . (1390/1392) 0.9985632183908046
4: . (1390/1392) 0.9985632183908046
5: . (1391/1392) 0.9992816091954023
6: . (1392/1392) 1.0
7: . (1392/1392) 1.0
8: . (1392/1392) 1.0
9: . (1392/1392) 1.0
Stopping: change in training set accuracy less than 1.0E-5
Stats: (1392/1392) 1.0
...done.

Found [1] results
Result 1: Type: [default] : PersonName: [Alisa ] [probability=0.5483001511243855]

Result for (2):

Indexing events with TwoPass using cutoff of 1
Computing event counts... done. 1392 events
Indexing... done.
Collecting events... Done indexing in 0.22 s.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1392
Number of Outcomes: 3
Number of Predicates: 9164
Computing model parameters...
Performing 70 iterations.
1: . (1355/1392) 0.9734195402298851
2: . (1383/1392) 0.9935344827586207
3: . (1390/1392) 0.9985632183908046
4: . (1390/1392) 0.9985632183908046
5: . (1391/1392) 0.9992816091954023
6: . (1392/1392) 1.0
7: . (1392/1392) 1.0
8: . (1392/1392) 1.0
9: . (1392/1392) 1.0
Stopping: change in training set accuracy less than 1.0E-5
Stats: (1392/1392) 1.0
...done.

Found [1] results
Result 1: Type: [default] : PersonName: [Mike ] [probability=0.460685209028902]

NameFinderME - Unexpected result using unchanged training data

Details

Description

Test code:

Result for (1):

Attachments

Activity

People

Dates