OpenNLP
  1. OpenNLP
  2. OPENNLP-200

Addition of prepositional phrase attachment dataset and unit test for it

    Details

      Description

      I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:

      http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

      1. OPENNLP-200.patch
        3.05 MB
        Joern Kottmann
      2. ppa.tar.gz
        761 kB
        Adwait Ratnaparkhi

        Activity

        Hide
        Joern Kottmann added a comment -

        Added more tests for the perceptron training, and added a test for maxent training.

        Show
        Joern Kottmann added a comment - Added more tests for the perceptron training, and added a test for maxent training.
        Hide
        Joern Kottmann added a comment -

        I fixed the issues mentioned above. Re-factored the test a little, and added an additional test for maxent.

        We should add more tests, to test the training code with various different settings.

        Show
        Joern Kottmann added a comment - I fixed the issues mentioned above. Re-factored the test a little, and added an additional test for maxent. We should add more tests, to test the training code with various different settings.
        Hide
        Joern Kottmann added a comment -

        The test is using the platform default encoding to read the data set. Since the default encoding is platform and location dependent this test will fail on other machines, or produce different results.

        To fix this always specify the encoding when opening the data, and it should be retrieved via the class path instead.

        Show
        Joern Kottmann added a comment - The test is using the platform default encoding to read the data set. Since the default encoding is platform and location dependent this test will fail on other machines, or produce different results. To fix this always specify the encoding when opening the data, and it should be retrieved via the class path instead.
        Hide
        Adwait Ratnaparkhi added a comment -

        Prepositional Phrase Attachment Dataset from

        Ratnaparkhi, Reynar, & Roukos. "A Maximum Entropy Model for Prepositional Phrase Attachment". ARPA HLT 1994.

        Show
        Adwait Ratnaparkhi added a comment - Prepositional Phrase Attachment Dataset from Ratnaparkhi, Reynar, & Roukos. "A Maximum Entropy Model for Prepositional Phrase Attachment". ARPA HLT 1994.
        Hide
        Joern Kottmann added a comment -

        The patch contains the rolled-back change and should be applied again when the IP clearance is done.

        Show
        Joern Kottmann added a comment - The patch contains the rolled-back change and should be applied again when the IP clearance is done.
        Hide
        Joern Kottmann added a comment -

        Ok, then lets remove it from our svn repository and attach it as a patch to this issue, when the IP clearance is done we can commit the patch.

        Show
        Joern Kottmann added a comment - Ok, then lets remove it from our svn repository and attach it as a patch to this issue, when the IP clearance is done we can commit the patch.
        Hide
        Jason Baldridge added a comment -

        +1 Fine to remove it.

        Sorry not to have moved on this. I've been busy, and you said to discuss the
        issues on the list, and I had referential failure and didn't get back to it.

        2011/7/1 Jörn Kottmann (JIRA) <jira@apache.org>


        Jason Baldridge
        Assistant Professor, Department of Linguistics
        The University of Texas at Austin
        http://www.jasonbaldridge.com
        http://twitter.com/jasonbaldridge

        Show
        Jason Baldridge added a comment - +1 Fine to remove it. Sorry not to have moved on this. I've been busy, and you said to discuss the issues on the list, and I had referential failure and didn't get back to it. 2011/7/1 Jörn Kottmann (JIRA) <jira@apache.org> – Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
        Hide
        Joern Kottmann added a comment -

        This issue is actually a release blocker, because we cannot release things which are not IP cleared.
        Basically this leaves us with two options, get the IP clearance done soon, or defer.
        I am actually +1 to defer. That would mean to remove the test and data, wait until the IP clearance is done and add it again. The reason I would like to defer is that I fear that doing the clearance takes too long and puts us in a state where we cannot release.

        Yeah, I also think that doing all this paper stuff is annoying, and that is sucks to remove this nice test, but that are the rules the ASF agreed on, and which we have to follow as an ASF project.

        Show
        Joern Kottmann added a comment - This issue is actually a release blocker, because we cannot release things which are not IP cleared. Basically this leaves us with two options, get the IP clearance done soon, or defer. I am actually +1 to defer. That would mean to remove the test and data, wait until the IP clearance is done and add it again. The reason I would like to defer is that I fear that doing the clearance takes too long and puts us in a state where we cannot release. Yeah, I also think that doing all this paper stuff is annoying, and that is sucks to remove this nice test, but that are the rules the ASF agreed on, and which we have to follow as an ASF project.
        Hide
        Joern Kottmann added a comment -

        I am not sure what the process is in this case, maybe the original creator of the data has to sign a SGA. Please discuss the issue on the mailing list.

        Show
        Joern Kottmann added a comment - I am not sure what the process is in this case, maybe the original creator of the data has to sign a SGA. Please discuss the issue on the mailing list.
        Hide
        Jason Baldridge added a comment -

        No... what is the procedure?

        2011/6/22 Jörn Kottmann (JIRA) <jira@apache.org>


        Jason Baldridge
        Assistant Professor, Department of Linguistics
        The University of Texas at Austin
        http://www.jasonbaldridge.com
        http://twitter.com/jasonbaldridge

        Show
        Jason Baldridge added a comment - No... what is the procedure? 2011/6/22 Jörn Kottmann (JIRA) <jira@apache.org> – Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
        Hide
        Joern Kottmann added a comment -

        Any updates here on the IP clearance?

        Show
        Joern Kottmann added a comment - Any updates here on the IP clearance?

          People

          • Assignee:
            Joern Kottmann
            Reporter:
            Jason Baldridge
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development