[OPENNLP-859] Cannot get entities from trained model using DictionaryFeatureGenerator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Question
Status: Closed
Priority: Major
Resolution: Feedback Received
Affects Version/s: 1.6.0
Fix Version/s: None
Component/s: Name Finder
Labels:
None
Environment:
ubuntu 16.04 java 8

Description

Hello,
I have created the following training data.

train.txt

Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma  .
il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma .
il mio cap è lo 00144 nella capitale e e il mio nome è  <START:person> john <END> .
Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio amico .
Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a calcio .

And then this code:

test.java

        Charset charset = Charset.forName("UTF-8");
        ObjectStream<String> lineStream =
                        new PlainTextByLineStream(new FileInputStream("/home/damiano/person.train"), charset);
        ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);

        TokenNameFinderModel model;

        Dictionary dictionary = new Dictionary();
        dictionary.put(new StringList(new String[]{"giovanni"}));
        dictionary.put(new StringList(new String[]{"maria"}));
        dictionary.put(new StringList(new String[]{"luca"}));
      
        BufferedOutputStream aa = null;
          
        AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
                 new AdaptiveFeatureGenerator[]{                                 
                    new WindowFeatureGenerator(new TokenFeatureGenerator(), 2, 2),
                    new WindowFeatureGenerator(new TokenClassFeatureGenerator(true), 2, 2),
                    new OutcomePriorFeatureGenerator(),
                    new PreviousMapFeatureGenerator(),
                    new BigramNameFeatureGenerator(),
                    new SentenceFeatureGenerator(true, false),
                    new DictionaryFeatureGenerator("person", dictionary)
                   });

        try {
            model = NameFinderME.train("it", "person", sampleStream, TrainingParameters.defaultParams(),
                    featureGenerator, Collections.<String, Object>emptyMap());
        }
        finally {
          sampleStream.close();
        }

        // Save trained model
        try (BufferedOutputStream modelOut = new BufferedOutputStream(new FileOutputStream("/home/damiano/it-person-custom.bin"))) {
          model.serialize(modelOut);
        }
                
        // Read the trained model
        try (InputStream modelIn = new FileInputStream("/home/damiano/it-person-custom.bin")) {

            TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);

            NameFinderME nameFinder = new NameFinderME(nerModel, featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
          
            String sentence[] = new String[]{
                "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."
            };
            
            Span nameSpans[] = nameFinder.find(sentence);                     
          
            System.out.println(Arrays.toString(Span.spansToStrings(nameSpans, sentence)));
        }

When i try

"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."

it correctly detect "Damiano" as PERSON, but if i change it with:

"Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."

it does not detect "maria" as PERSON but I added "maria" in the dictionary so it should get it. Why not ?

Thanks!

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Damiano Porta

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Aug/16 15:20

Updated:: 16/Dec/22 12:45

Resolved:: 16/Dec/22 12:45