Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-859

Cannot get entities from trained model using DictionaryFeatureGenerator

    XMLWordPrintableJSON

Details

    • Question
    • Status: Closed
    • Major
    • Resolution: Feedback Received
    • 1.6.0
    • None
    • Name Finder
    • None
    • ubuntu 16.04 java 8

    Description

      Hello,
      I have created the following training data.

      train.txt
      Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma  .
      il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma .
      il mio cap è lo 00144 nella capitale e e il mio nome è  <START:person> john <END> .
      Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio amico .
      Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a calcio .
      

      And then this code:

      test.java
              Charset charset = Charset.forName("UTF-8");
              ObjectStream<String> lineStream =
                              new PlainTextByLineStream(new FileInputStream("/home/damiano/person.train"), charset);
              ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
      
              TokenNameFinderModel model;
      
              Dictionary dictionary = new Dictionary();
              dictionary.put(new StringList(new String[]{"giovanni"}));
              dictionary.put(new StringList(new String[]{"maria"}));
              dictionary.put(new StringList(new String[]{"luca"}));
            
              BufferedOutputStream aa = null;
                
              AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
                       new AdaptiveFeatureGenerator[]{                                 
                          new WindowFeatureGenerator(new TokenFeatureGenerator(), 2, 2),
                          new WindowFeatureGenerator(new TokenClassFeatureGenerator(true), 2, 2),
                          new OutcomePriorFeatureGenerator(),
                          new PreviousMapFeatureGenerator(),
                          new BigramNameFeatureGenerator(),
                          new SentenceFeatureGenerator(true, false),
                          new DictionaryFeatureGenerator("person", dictionary)
                         });
      
              try {
                  model = NameFinderME.train("it", "person", sampleStream, TrainingParameters.defaultParams(),
                          featureGenerator, Collections.<String, Object>emptyMap());
              }
              finally {
                sampleStream.close();
              }
      
              // Save trained model
              try (BufferedOutputStream modelOut = new BufferedOutputStream(new FileOutputStream("/home/damiano/it-person-custom.bin"))) {
                model.serialize(modelOut);
              }
                      
              // Read the trained model
              try (InputStream modelIn = new FileInputStream("/home/damiano/it-person-custom.bin")) {
      
                  TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);
      
                  NameFinderME nameFinder = new NameFinderME(nerModel, featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
                
                  String sentence[] = new String[]{
                      "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."
                  };
                  
                  Span nameSpans[] = nameFinder.find(sentence);                     
                
                  System.out.println(Arrays.toString(Span.spansToStrings(nameSpans, sentence)));
              }      
      

      When i try

      "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."
      

      it correctly detect "Damiano" as PERSON, but if i change it with:

      "Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."
      

      it does not detect "maria" as PERSON but I added "maria" in the dictionary so it should get it. Why not ?

      Thanks!

      Attachments

        Activity

          People

            Unassigned Unassigned
            damianoporta Damiano Porta
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: