Details
-
Question
-
Status: Closed
-
Major
-
Resolution: Feedback Received
-
1.6.0
-
None
-
None
-
ubuntu 16.04 java 8
Description
Hello,
I have created the following training data.
train.txt
Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma . il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma . il mio cap è lo 00144 nella capitale e e il mio nome è <START:person> john <END> . Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio amico . Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a calcio .
And then this code:
test.java
Charset charset = Charset.forName("UTF-8"); ObjectStream<String> lineStream = new PlainTextByLineStream(new FileInputStream("/home/damiano/person.train"), charset); ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream); TokenNameFinderModel model; Dictionary dictionary = new Dictionary(); dictionary.put(new StringList(new String[]{"giovanni"})); dictionary.put(new StringList(new String[]{"maria"})); dictionary.put(new StringList(new String[]{"luca"})); BufferedOutputStream aa = null; AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator( new AdaptiveFeatureGenerator[]{ new WindowFeatureGenerator(new TokenFeatureGenerator(), 2, 2), new WindowFeatureGenerator(new TokenClassFeatureGenerator(true), 2, 2), new OutcomePriorFeatureGenerator(), new PreviousMapFeatureGenerator(), new BigramNameFeatureGenerator(), new SentenceFeatureGenerator(true, false), new DictionaryFeatureGenerator("person", dictionary) }); try { model = NameFinderME.train("it", "person", sampleStream, TrainingParameters.defaultParams(), featureGenerator, Collections.<String, Object>emptyMap()); } finally { sampleStream.close(); } // Save trained model try (BufferedOutputStream modelOut = new BufferedOutputStream(new FileOutputStream("/home/damiano/it-person-custom.bin"))) { model.serialize(modelOut); } // Read the trained model try (InputStream modelIn = new FileInputStream("/home/damiano/it-person-custom.bin")) { TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn); NameFinderME nameFinder = new NameFinderME(nerModel, featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE); String sentence[] = new String[]{ "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "." }; Span nameSpans[] = nameFinder.find(sentence); System.out.println(Arrays.toString(Span.spansToStrings(nameSpans, sentence))); }
When i try
"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."
it correctly detect "Damiano" as PERSON, but if i change it with:
"Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."
it does not detect "maria" as PERSON but I added "maria" in the dictionary so it should get it. Why not ?
Thanks!