OpenNLP
  1. OpenNLP
  2. OPENNLP-48

Write documentation for the coreference component

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Documentation
    • Labels:
      None

      Description

      As part of the coref refactoring documentation should be written which explains
      how to use and train the coreference component.

        Issue Links

          Activity

          Hide
          Rengarajan Seshadri added a comment -

          Hi All,

          This is Rengarajan (Renga). I have been using OpenNLP for quite sometime and it is very helpful and I am
          very happy with it.

          I have been trying to get Anaphora Resolution (mostly Pronoun Resolution) implemented in one of my projects.
          When started looking for this feature, I found out that CoRef package of OpenNLP offers some solution. But the
          documentation is missing as rightly mentioned in the title of this JIRA issue.

          Therefore I decided to understand the code. I checked out the code from the svn repository. I wanted to
          go through the code and then test before I myself start documenting. Unfortunately, I don't see coref
          package in the source (trunk).

          I used the link given in the page "http://opennlp.apache.org/source-code.html". That is

          svn co https://svn.apache.org/repos/asf/opennlp/trunk/

          SVN co completed without any error. But I don't see coref source under "opennlp.tools" folder as I see in library
          of "opennlp-tools-1.5.3.jar" of the binary distribution. Am I missing something? Can someone help me?

          thanks
          Renga

          Show
          Rengarajan Seshadri added a comment - Hi All, This is Rengarajan (Renga). I have been using OpenNLP for quite sometime and it is very helpful and I am very happy with it. I have been trying to get Anaphora Resolution (mostly Pronoun Resolution) implemented in one of my projects. When started looking for this feature, I found out that CoRef package of OpenNLP offers some solution. But the documentation is missing as rightly mentioned in the title of this JIRA issue. Therefore I decided to understand the code. I checked out the code from the svn repository. I wanted to go through the code and then test before I myself start documenting. Unfortunately, I don't see coref package in the source (trunk). I used the link given in the page "http://opennlp.apache.org/source-code.html". That is svn co https://svn.apache.org/repos/asf/opennlp/trunk/ SVN co completed without any error. But I don't see coref source under "opennlp.tools" folder as I see in library of "opennlp-tools-1.5.3.jar" of the binary distribution. Am I missing something? Can someone help me? thanks Renga
          Hide
          Joern Kottmann added a comment -

          Hello,

          that sounds really great. We could need someone who takes care of the coref component.
          The code was moved out of opennlp-tools to the sandbox.

          It is now located here:
          https://svn.apache.org/repos/asf/opennlp/sandbox/opennlp-coref/

          We currently don't have a data set to train it on, and that means we also can't really test
          any modifications we make to it.
          A good corpus is probably OntoNotes, I worked a bit on training it on that, but did not succeed and
          then never finished it.

          Do you have some data for training?

          Thanks,
          Jörn

          Show
          Joern Kottmann added a comment - Hello, that sounds really great. We could need someone who takes care of the coref component. The code was moved out of opennlp-tools to the sandbox. It is now located here: https://svn.apache.org/repos/asf/opennlp/sandbox/opennlp-coref/ We currently don't have a data set to train it on, and that means we also can't really test any modifications we make to it. A good corpus is probably OntoNotes, I worked a bit on training it on that, but did not succeed and then never finished it. Do you have some data for training? Thanks, Jörn
          Hide
          Rengarajan Seshadri added a comment -

          Hi Joern,

          Thanks for providing me with the links for the source. I had some minor difficulty compiling the source due to the import statements. Once I made them refer to the released(1.5.3) opennlp.tools jars, I was able to build with no errors. Hope those changes should be fine.

          On the question of data set, I don't have any.

          I have some test cases. I wanted to use coreference component to test whether I can start using them further in my other projects. That is how I began. As I see from your observations, a more thorough testing is needed and probably some modifications may also be due.

          Last few days I have been trying to understand the code (probably the theory behind it). Made little progress.
          Can you point me to some reference to works/paper on conference that this implementation is based on?

          On the data side, when you refer to OntoNotes, do you mean the data available at
          http://conll.cemantix.org/2012/data.html ?

          thanks
          Renga

          Show
          Rengarajan Seshadri added a comment - Hi Joern, Thanks for providing me with the links for the source. I had some minor difficulty compiling the source due to the import statements. Once I made them refer to the released(1.5.3) opennlp.tools jars, I was able to build with no errors. Hope those changes should be fine. On the question of data set, I don't have any. I have some test cases. I wanted to use coreference component to test whether I can start using them further in my other projects. That is how I began. As I see from your observations, a more thorough testing is needed and probably some modifications may also be due. Last few days I have been trying to understand the code (probably the theory behind it). Made little progress. Can you point me to some reference to works/paper on conference that this implementation is based on? On the data side, when you refer to OntoNotes, do you mean the data available at http://conll.cemantix.org/2012/data.html ? thanks Renga
          Hide
          Joern Kottmann added a comment -

          I was referring to this data:
          http://catalog.ldc.upenn.edu/LDC2011T03

          It is now also released in version 5.0, it is probably better to get that instead. The OpenNLP coref component only works on noun phrases, if I recall correctly the OntoNotes data annotated more which made it challenging to determine which phrases to use for training, and which not.

          I will provide you with more details soon, I will have to dig a bit into the code again.

          Show
          Joern Kottmann added a comment - I was referring to this data: http://catalog.ldc.upenn.edu/LDC2011T03 It is now also released in version 5.0, it is probably better to get that instead. The OpenNLP coref component only works on noun phrases, if I recall correctly the OntoNotes data annotated more which made it challenging to determine which phrases to use for training, and which not. I will provide you with more details soon, I will have to dig a bit into the code again.
          Hide
          Rengarajan Seshadri added a comment -

          I have requested for a license for the OntoNotes data from LDC. Per their reply the latest is
          http://catalog.ldc.upenn.edu/LDC2013T19
          and I will be able to use that data for the train and test cycle of the co-reference module.
          Once I have the data, I will let you know and take necessary help from you.

          Show
          Rengarajan Seshadri added a comment - I have requested for a license for the OntoNotes data from LDC. Per their reply the latest is http://catalog.ldc.upenn.edu/LDC2013T19 and I will be able to use that data for the train and test cycle of the co-reference module. Once I have the data, I will let you know and take necessary help from you.
          Hide
          Pavlina Fragkou added a comment -

          Dear sirs/madams
          I am trying to use the run the Version 1.5.3 in win 7. More specifically I am interested in running the NE and coreference resolution tools.
          In order to run the tool, I set the classpath as set CLASSPATH = lib\jwnl-1.3.3.jar;lib\opennlp-maxent-3.0.3.jar;lib\opennlp-tools-1.5.3.jar;lib\opennlp-uima-1.5.3.jar

          I succesfully run the following
          java -jar lib\opennlp-tools-1.5.3.jar SentenceDetector models\en-sent.bin < 1.txt | java -jar lib\opennlp-tools-1.5.3.jar TokenizerME models\en-token.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-person.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-date.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-location.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-money.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-percentage.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-organization.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-time.bin | java -jar lib\opennlp-tools-1.5.3.jar POSTagger models\en-pos-maxent.bin | java -jar lib\opennlp-tools-1.5.3.jar ChunkerME models\en-chunker.bin

          However I encounter some problems while running the Coreference resolution interface. I must confess that, I do not wish to use the CoreferencerTrainer or the CoreferenceConverter, since I would like to use an already trained model.

          I suppose that I must run a command like the following:

          java -jar lib\opennlp-tools-1.5.3.jar SentenceDetector models\en-sent.bin < 1.txt | java -jar lib\opennlp-tools-1.5.3.jar TokenizerME models\en-token.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-person.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-date.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-location.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-money.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-percentage.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-organization.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-time.bin | java -jar -DWNSEARCHDIR=C:\Wordnet lib\opennlp-tools-1.5.3.jar Coreferencer models\coreference

          where -DWNSEARCHDIR is the directory where Wordnet resides.

          However this does not work. No output is produced.
          Can you please provide me with an example on how I must run the coreference tool and which other tools must run before it? POSTagger and ChunkerME are nessecary?

          Additionally, Wordnet 3.0 for Windows is not provided. Is there a problem if I use the unix version (just to unzip it) or can I use the 2.1 wordnet version? According to http://wordnetcode.princeton.edu/3.0/CHANGES the following changes between WordNet 2.1 and 3.0 are performed

          " Some changes were made to the graphical interface and WordNet library
          with regard to adjective and adverb searches. The adjective search
          "Synonyms/Related Nouns" was relabeled "Synonyms", and, similarly, the
          adverb search "Synonyms/Stem Adjectives" was relabled "Synonyms". A
          separate "Related Noun" search was inserted for adjectives, and a
          separate "Base Adjective" search was added for adverbs. "

          I would like to thank you in advance
          Pavlina

          Show
          Pavlina Fragkou added a comment - Dear sirs/madams I am trying to use the run the Version 1.5.3 in win 7. More specifically I am interested in running the NE and coreference resolution tools. In order to run the tool, I set the classpath as set CLASSPATH = lib\jwnl-1.3.3.jar;lib\opennlp-maxent-3.0.3.jar;lib\opennlp-tools-1.5.3.jar;lib\opennlp-uima-1.5.3.jar I succesfully run the following java -jar lib\opennlp-tools-1.5.3.jar SentenceDetector models\en-sent.bin < 1.txt | java -jar lib\opennlp-tools-1.5.3.jar TokenizerME models\en-token.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-person.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-date.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-location.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-money.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-percentage.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-organization.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-time.bin | java -jar lib\opennlp-tools-1.5.3.jar POSTagger models\en-pos-maxent.bin | java -jar lib\opennlp-tools-1.5.3.jar ChunkerME models\en-chunker.bin However I encounter some problems while running the Coreference resolution interface. I must confess that, I do not wish to use the CoreferencerTrainer or the CoreferenceConverter, since I would like to use an already trained model. I suppose that I must run a command like the following: java -jar lib\opennlp-tools-1.5.3.jar SentenceDetector models\en-sent.bin < 1.txt | java -jar lib\opennlp-tools-1.5.3.jar TokenizerME models\en-token.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-person.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-date.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-location.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-money.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-percentage.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-organization.bin | java -jar lib\opennlp-tools-1.5.3.jar TokenNameFinder models\en-ner-time.bin | java -jar -DWNSEARCHDIR=C:\Wordnet lib\opennlp-tools-1.5.3.jar Coreferencer models\coreference where -DWNSEARCHDIR is the directory where Wordnet resides. However this does not work. No output is produced. Can you please provide me with an example on how I must run the coreference tool and which other tools must run before it? POSTagger and ChunkerME are nessecary? Additionally, Wordnet 3.0 for Windows is not provided. Is there a problem if I use the unix version (just to unzip it) or can I use the 2.1 wordnet version? According to http://wordnetcode.princeton.edu/3.0/CHANGES the following changes between WordNet 2.1 and 3.0 are performed " Some changes were made to the graphical interface and WordNet library with regard to adjective and adverb searches. The adjective search "Synonyms/Related Nouns" was relabeled "Synonyms", and, similarly, the adverb search "Synonyms/Stem Adjectives" was relabled "Synonyms". A separate "Related Noun" search was inserted for adjectives, and a separate "Base Adjective" search was added for adverbs. " I would like to thank you in advance Pavlina

            People

            • Assignee:
              Unassigned
              Reporter:
              Joern Kottmann
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development