Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Bug
-
1.9.2
-
None
-
None
Description
Hello,
Based on NameFinderMETest.java / function testNameFinder(), I have written a simple test code and changed the test sentence
from (1):
String[] sentence = {"Alisa", "appreciated", "the", "hint", "and", "enjoyed", "a", "delicious", "traditional", "meal."};
to (2):
String[] sentence = {"Alisa", "and", "Mike", "appreciated", "the", "hint", "and", "enjoyed", "a", "delicious", "traditional", "meal."};
(Just added "and Mike") and expected to get 2 results (two names Alisa and Mike) because both names are annotated in the training data. I just get 1 result (Mike) for (2). I used the training data file AnnotatedSentences.txt (unchanged).
Can anyone tell me what's wrong? Thanks.
Test code:
String trainingDatafilePath = "opennlp/tools/namefind/AnnotatedSentences.txt"; String encoding = "ISO-8859-1"; ObjectStream<NameSample> sampleStream = new NameSampleDataStream(new PlainTextByLineStream(new MarkableFileInputStreamFactory(new File(trainingDatafilePath+"AnnotatedSentences.txt")), encoding)); TrainingParameters params = new TrainingParameters(); params.put(TrainingParameters.ITERATIONS_PARAM, 70); params.put(TrainingParameters.CUTOFF_PARAM, 1); TokenNameFinderModel nameFinderModel = NameFinderME.train("eng", null, sampleStream, params, TokenNameFinderFactory.create(null, null, Collections.emptyMap(), new BioCodec())); TokenNameFinder nameFinder = new NameFinderME(nameFinderModel); // now test if it can detect the sample sentences String[] sentence = {"Alisa", "and", "Mike", "appreciated", "the", "hint", "and", "enjoyed", "a", "delicious", "traditional", "meal."}; Span[] names = nameFinder.find(sentence); if (names != null && names.length != 0) { System.out.println(" > Found ["+names.length+"] results"); for(Span name : names){ String personName=""; for(int i=name.getStart(); i<name.getEnd(); i++){ personName+=sentence[i]+" "; } System.out.println(" > Result "+1+": Type: ["+name.getType()+"] : PersonName: ["+personName+"]\t [probability="+name.getProb()+"]"); } } else { System.out.println(" > No results found"); }
Result for (1):
Indexing events with TwoPass using cutoff of 1
Computing event counts... done. 1392 events
Indexing... done.
Collecting events... Done indexing in 0.22 s.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1392
Number of Outcomes: 3
Number of Predicates: 9164
Computing model parameters...
Performing 70 iterations.
1: . (1355/1392) 0.9734195402298851
2: . (1383/1392) 0.9935344827586207
3: . (1390/1392) 0.9985632183908046
4: . (1390/1392) 0.9985632183908046
5: . (1391/1392) 0.9992816091954023
6: . (1392/1392) 1.0
7: . (1392/1392) 1.0
8: . (1392/1392) 1.0
9: . (1392/1392) 1.0
Stopping: change in training set accuracy less than 1.0E-5
Stats: (1392/1392) 1.0
...done.
Found [1] results
Result 1: Type: [default] : PersonName: [Alisa ] [probability=0.5483001511243855]
Result for (2):
Indexing events with TwoPass using cutoff of 1
Computing event counts... done. 1392 events
Indexing... done.
Collecting events... Done indexing in 0.22 s.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1392
Number of Outcomes: 3
Number of Predicates: 9164
Computing model parameters...
Performing 70 iterations.
1: . (1355/1392) 0.9734195402298851
2: . (1383/1392) 0.9935344827586207
3: . (1390/1392) 0.9985632183908046
4: . (1390/1392) 0.9985632183908046
5: . (1391/1392) 0.9992816091954023
6: . (1392/1392) 1.0
7: . (1392/1392) 1.0
8: . (1392/1392) 1.0
9: . (1392/1392) 1.0
Stopping: change in training set accuracy less than 1.0E-5
Stats: (1392/1392) 1.0
...done.
Found [1] results
Result 1: Type: [default] : PersonName: [Mike ] [probability=0.460685209028902]