OpenNLP
  1. OpenNLP
  2. OPENNLP-155

unreliable training set accuracy in perceptron

    Details

      Description

      The training accuracies reported during perceptron training were much higher than final training accuracy, which turned out to be an artifact of the way training examples were ordered.

        Activity

        Hide
        Jason Baldridge added a comment -

        I changed this so that after each iteration, the training accuracy is scored without changing the parameters. This gives a coherent value reported on every iteration, and it also allows early stopping by checking whether the same accuracy has been obtained for some number of times (e.g. 4) in a row. (This could also be done by checking that parameter values haven't changed, which would be better, but which I'd only want to do after refactoring.)

        Show
        Jason Baldridge added a comment - I changed this so that after each iteration, the training accuracy is scored without changing the parameters. This gives a coherent value reported on every iteration, and it also allows early stopping by checking whether the same accuracy has been obtained for some number of times (e.g. 4) in a row. (This could also be done by checking that parameter values haven't changed, which would be better, but which I'd only want to do after refactoring.)
        Hide
        Joern Kottmann added a comment -

        I assume the updates you did for this change went into the PerceptronTrainer class.

        After applying this change and doing the evaluation on a pos data set, I noticed that the accuracy is 2% higher than before, can you quickly explain how this change affects the accuracy?

        BTW, the diff is difficult to read because you changed many white spaces and did minor reformatting here and there.

        Show
        Joern Kottmann added a comment - I assume the updates you did for this change went into the PerceptronTrainer class. After applying this change and doing the evaluation on a pos data set, I noticed that the accuracy is 2% higher than before, can you quickly explain how this change affects the accuracy? BTW, the diff is difficult to read because you changed many white spaces and did minor reformatting here and there.
        Hide
        Joern Kottmann added a comment -

        Why did you introduce a new variable oei in the trainingStats methods, isn't oei always equal to the previously used ei?
        If so I suggest that we remove oei again, and only use ei, as it was before.

        Show
        Joern Kottmann added a comment - Why did you introduce a new variable oei in the trainingStats methods, isn't oei always equal to the previously used ei? If so I suggest that we remove oei again, and only use ei, as it was before.
        Hide
        Joern Kottmann added a comment -

        I did a little more testing, it turned out that this change increased the training time on my test set from 9 minutes to 16 minutes.
        I believe its better to have this fix because then the stoping criteria works correctly and avoids over training of the model, right?

        Anyway I think it should not change the accuracy of the trained model, I did in both cases 100 iterations and the difference during evaluation was 2 percent. I would like to find out why we get this difference, and not an exactly identical model. The stoping criteria didn't terminate the training in my two runs, so I think it should not be linked to over training.

        Show
        Joern Kottmann added a comment - I did a little more testing, it turned out that this change increased the training time on my test set from 9 minutes to 16 minutes. I believe its better to have this fix because then the stoping criteria works correctly and avoids over training of the model, right? Anyway I think it should not change the accuracy of the trained model, I did in both cases 100 iterations and the difference during evaluation was 2 percent. I would like to find out why we get this difference, and not an exactly identical model. The stoping criteria didn't terminate the training in my two runs, so I think it should not be linked to over training.
        Hide
        Joern Kottmann added a comment -

        One more question, in line 178 it says:
        if (currAccuracy == prevAccuracy)

        Don't we need a small delta to check these two double numbers for equality?

        Show
        Joern Kottmann added a comment - One more question, in line 178 it says: if (currAccuracy == prevAccuracy) Don't we need a small delta to check these two double numbers for equality?
        Hide
        Jason Baldridge added a comment -

        2011/4/22 Jörn Kottmann (JIRA) <jira@apache.org>

        Correct.

        Great! The basic problem was that the model was probably stopping too soon
        because of the failure to obtain the actual accuracy on the training set.
        Basically, it thought it was doing very well, when in fact that value was
        too high because of the sorted way examples were presented to the training
        algorithm.

        BTW, the diff is difficult to read because you changed many white spaces and
        Sorry – the code was hard to wade through, and reorganizing it helped me
        see what was going on. I also got rid of unnecessary code duplication by
        defining a variable updateValue that is +1 for the correct label and -1 for
        the incorrect labels. That turned this:

        for (int oi = 0;oi<numOutcomes;oi++) {
        if (oi == outcomeList[oei]) {
        if (modelDistribution[oi] <= 0) {
        for (int ci = 0; ci < contexts[ei].length; ci++) {
        int pi = contexts[ei][ci];
        if (values == null)

        { params[pi].updateParameter(oi, 1); }

        else

        { params[pi].updateParameter(oi, values[ei][ci]); }

        if (useAverage) {
        if (updates[pi][oi][VALUE] != 0)

        { averageParams[pi].updateParameter(oi,updates[pi][oi][VALUE]*(numEvents*(iteration-updates[pi][oi][ITER])+(ei-updates[pi][oi][EVENT]))); //System.err.println("p avp["+pi+"]."+oi+"="+averageParams[pi].getParameters()[oi]); }

        //System.err.println("p
        updates["+pi+"]["+oi+"]=("updates[pi][oi][ITER]","updates[pi][oi][EVENT]","updates[pi][oi][VALUE]")
        + ("iteration","ei","params[pi].getParameters()[oi]") ->
        "+averageParams[pi].getParameters()[oi]);
        updates[pi][oi][VALUE] = (int) params[pi].getParameters()[oi];
        updates[pi][oi][ITER] = iteration;
        updates[pi][oi][EVENT] = ei;
        }
        }
        }
        }
        else {
        if (modelDistribution[oi] > 0) {
        for (int ci = 0; ci < contexts[ei].length; ci++) {
        int pi = contexts[ei][ci];
        if (values == null)

        { params[pi].updateParameter(oi, -1); }

        else

        { params[pi].updateParameter(oi, -1*values[ei][ci]); }

        if (useAverage) {
        if (updates[pi][oi][VALUE] != 0)

        { averageParams[pi].updateParameter(oi,updates[pi][oi][VALUE]*(numEvents*(iteration-updates[pi][oi][ITER])+(ei-updates[pi][oi][EVENT]))); //System.err.println("d avp["+pi+"]."+oi+"="+averageParams[pi].getParameters()[oi]); }

        //System.err.println(ei+" d
        updates["+pi+"]["+oi+"]=("updates[pi][oi][ITER]","updates[pi][oi][EVENT]","updates[pi][oi][VALUE]")
        + ("iteration","ei","params[pi].getParameters()[oi]") ->
        "+averageParams[pi].getParameters()[oi]);
        updates[pi][oi][VALUE] = (int) params[pi].getParameters()[oi];
        updates[pi][oi][ITER] = iteration;
        updates[pi][oi][EVENT] = ei;
        }
        }
        }
        }
        }

        into this:

        for (int oi = 0;oi<numOutcomes;oi++) {
        int updateValue = -1;
        if (oi == outcomeList[oei])
        updateValue = 1;

        if (modelDistribution[oi]*updateValue <= 0) {
        for (int ci = 0; ci < contexts[ei].length; ci++) {
        int pi = contexts[ei][ci];
        if (values == null)
        params[pi].updateParameter(oi, updateValue);
        else
        params[pi].updateParameter(oi, updateValue*values[ei][ci]);

        if (useAverage)

        { if (updates[pi][oi][VALUE] != 0) averageParams[pi].updateParameter(oi, updates[pi][oi][VALUE] * (numEvents * (iteration-updates[pi][oi][ITER]) + (ei-updates[pi][oi][EVENT]))); updates[pi][oi][VALUE] = (int) params[pi].getParameters()[oi]; updates[pi][oi][ITER] = iteration; updates[pi][oi][EVENT] = ei; }

        }
        }
        }

        -Jason


        Jason Baldridge
        Assistant Professor, Department of Linguistics
        The University of Texas at Austin
        http://www.jasonbaldridge.com
        http://twitter.com/jasonbaldridge

        Show
        Jason Baldridge added a comment - 2011/4/22 Jörn Kottmann (JIRA) <jira@apache.org> Correct. Great! The basic problem was that the model was probably stopping too soon because of the failure to obtain the actual accuracy on the training set. Basically, it thought it was doing very well, when in fact that value was too high because of the sorted way examples were presented to the training algorithm. BTW, the diff is difficult to read because you changed many white spaces and Sorry – the code was hard to wade through, and reorganizing it helped me see what was going on. I also got rid of unnecessary code duplication by defining a variable updateValue that is +1 for the correct label and -1 for the incorrect labels. That turned this: for (int oi = 0;oi<numOutcomes;oi++) { if (oi == outcomeList [oei] ) { if (modelDistribution [oi] <= 0) { for (int ci = 0; ci < contexts [ei] .length; ci++) { int pi = contexts [ei] [ci] ; if (values == null) { params[pi].updateParameter(oi, 1); } else { params[pi].updateParameter(oi, values[ei][ci]); } if (useAverage) { if (updates [pi] [oi] [VALUE] != 0) { averageParams[pi].updateParameter(oi,updates[pi][oi][VALUE]*(numEvents*(iteration-updates[pi][oi][ITER])+(ei-updates[pi][oi][EVENT]))); //System.err.println("p avp["+pi+"]."+oi+"="+averageParams[pi].getParameters()[oi]); } //System.err.println("p updates ["+pi+"] ["+oi+"] =(" updates [pi] [oi] [ITER] "," updates [pi] [oi] [EVENT] "," updates [pi] [oi] [VALUE] ") + (" iteration "," ei "," params [pi] .getParameters() [oi] ") -> "+averageParams [pi] .getParameters() [oi] ); updates [pi] [oi] [VALUE] = (int) params [pi] .getParameters() [oi] ; updates [pi] [oi] [ITER] = iteration; updates [pi] [oi] [EVENT] = ei; } } } } else { if (modelDistribution [oi] > 0) { for (int ci = 0; ci < contexts [ei] .length; ci++) { int pi = contexts [ei] [ci] ; if (values == null) { params[pi].updateParameter(oi, -1); } else { params[pi].updateParameter(oi, -1*values[ei][ci]); } if (useAverage) { if (updates [pi] [oi] [VALUE] != 0) { averageParams[pi].updateParameter(oi,updates[pi][oi][VALUE]*(numEvents*(iteration-updates[pi][oi][ITER])+(ei-updates[pi][oi][EVENT]))); //System.err.println("d avp["+pi+"]."+oi+"="+averageParams[pi].getParameters()[oi]); } //System.err.println(ei+" d updates ["+pi+"] ["+oi+"] =(" updates [pi] [oi] [ITER] "," updates [pi] [oi] [EVENT] "," updates [pi] [oi] [VALUE] ") + (" iteration "," ei "," params [pi] .getParameters() [oi] ") -> "+averageParams [pi] .getParameters() [oi] ); updates [pi] [oi] [VALUE] = (int) params [pi] .getParameters() [oi] ; updates [pi] [oi] [ITER] = iteration; updates [pi] [oi] [EVENT] = ei; } } } } } into this: for (int oi = 0;oi<numOutcomes;oi++) { int updateValue = -1; if (oi == outcomeList [oei] ) updateValue = 1; if (modelDistribution [oi] *updateValue <= 0) { for (int ci = 0; ci < contexts [ei] .length; ci++) { int pi = contexts [ei] [ci] ; if (values == null) params [pi] .updateParameter(oi, updateValue); else params [pi] .updateParameter(oi, updateValue*values [ei] [ci] ); if (useAverage) { if (updates[pi][oi][VALUE] != 0) averageParams[pi].updateParameter(oi, updates[pi][oi][VALUE] * (numEvents * (iteration-updates[pi][oi][ITER]) + (ei-updates[pi][oi][EVENT]))); updates[pi][oi][VALUE] = (int) params[pi].getParameters()[oi]; updates[pi][oi][ITER] = iteration; updates[pi][oi][EVENT] = ei; } } } } -Jason – Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
        Hide
        Jason Baldridge added a comment -

        2011/4/22 Jörn Kottmann (JIRA) <jira@apache.org>

        Yes. However, we can certainly fix this so that is is both fast and correct.
        I just coded it to get the right answer, but it is essentially doing double
        work now.

        Hmm... so there is actually an odd aspect of how the perceptron is
        implemented that isn't the textbook way. The trigger for whether to update
        the parameters is if the correct label is assigned a score <= zero, and if
        any incorrect label gets a score > zero. Normally, update happens whenever
        an incorrect label gets a higher score than the correct label, regardless of
        positivity or negativity. Anyway, I changed it so that the same code is
        used, based on the updateValue variable. What that means is that now
        incorrect labels get updated when their score is zero. Otherwise the code
        should be the same. But that is the likely difference because initially the
        scores of many examples will be zero, and then updates made in the first
        pass are different from the previous version. You could test that by
        changing the line in the previous
        version<http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-maxent/src/main/java/opennlp/perceptron/PerceptronTrainer.java?revision=1049002&view=markup>
        :

        else {
        if (modelDistribution[oi] > 0) {

        to be >= 0 instead.

        Jason


        Jason Baldridge
        Assistant Professor, Department of Linguistics
        The University of Texas at Austin
        http://www.jasonbaldridge.com
        http://twitter.com/jasonbaldridge

        Show
        Jason Baldridge added a comment - 2011/4/22 Jörn Kottmann (JIRA) <jira@apache.org> Yes. However, we can certainly fix this so that is is both fast and correct. I just coded it to get the right answer, but it is essentially doing double work now. Hmm... so there is actually an odd aspect of how the perceptron is implemented that isn't the textbook way. The trigger for whether to update the parameters is if the correct label is assigned a score <= zero, and if any incorrect label gets a score > zero. Normally, update happens whenever an incorrect label gets a higher score than the correct label, regardless of positivity or negativity. Anyway, I changed it so that the same code is used, based on the updateValue variable. What that means is that now incorrect labels get updated when their score is zero. Otherwise the code should be the same. But that is the likely difference because initially the scores of many examples will be zero, and then updates made in the first pass are different from the previous version. You could test that by changing the line in the previous version< http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-maxent/src/main/java/opennlp/perceptron/PerceptronTrainer.java?revision=1049002&view=markup > : else { if (modelDistribution [oi] > 0) { to be >= 0 instead. Jason – Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
        Hide
        Jason Baldridge added a comment -

        That would be fine to do. This is a check that approximates whether the
        parameters have remained stable for multiple iterations. It would probably
        be better to allow equality plus or minus a small delta.

        2011/4/22 Jörn Kottmann (JIRA) <jira@apache.org>


        Jason Baldridge
        Assistant Professor, Department of Linguistics
        The University of Texas at Austin
        http://www.jasonbaldridge.com
        http://twitter.com/jasonbaldridge

        Show
        Jason Baldridge added a comment - That would be fine to do. This is a check that approximates whether the parameters have remained stable for multiple iterations. It would probably be better to allow equality plus or minus a small delta. 2011/4/22 Jörn Kottmann (JIRA) <jira@apache.org> – Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
        Hide
        Joern Kottmann added a comment -

        Thanks for explanations Jason.

        Show
        Joern Kottmann added a comment - Thanks for explanations Jason.
        Hide
        Joern Kottmann added a comment -

        > BTW, the diff is difficult to read because you changed many white spaces and
        > Sorry – the code was hard to wade through, and reorganizing it helped me
        > see what was going on. I also got rid of unnecessary code duplication by
        > defining a variable updateValue that is +1 for the correct label and -1 for
        > the incorrect labels

        When you do refactoring (which influenced the accuracy), reformatting and the actual change
        at once it is hard to understand what happened at all in the end when someone else looks at it.

        > Yes. However, we can certainly fix this so that is is both fast and correct.
        > I just coded it to get the right answer, but it is essentially doing double
        > work now.

        The focus should be on correct, so it is now better than before. How could it be faster again?
        Because now we compute the training stats after every iteration compared to just once in the previous
        implementation.

        Show
        Joern Kottmann added a comment - > BTW, the diff is difficult to read because you changed many white spaces and > Sorry – the code was hard to wade through, and reorganizing it helped me > see what was going on. I also got rid of unnecessary code duplication by > defining a variable updateValue that is +1 for the correct label and -1 for > the incorrect labels When you do refactoring (which influenced the accuracy), reformatting and the actual change at once it is hard to understand what happened at all in the end when someone else looks at it. > Yes. However, we can certainly fix this so that is is both fast and correct. > I just coded it to get the right answer, but it is essentially doing double > work now. The focus should be on correct, so it is now better than before. How could it be faster again? Because now we compute the training stats after every iteration compared to just once in the previous implementation.
        Hide
        Joern Kottmann added a comment -

        In some reformatted places tabs have been used instead of white spaces to indent the code, in OpenNLP we do an indent with 2 white spaces. I replaced the tabs with white spaces.

        Show
        Joern Kottmann added a comment - In some reformatted places tabs have been used instead of white spaces to indent the code, in OpenNLP we do an indent with 2 white spaces. I replaced the tabs with white spaces.
        Hide
        Jason Baldridge added a comment -

        2011/4/26 Jörn Kottmann (JIRA) <jira@apache.org>

        Sorry – will keep that in mind in the future. I sort of did it while I was
        going through the code trying to simplify and understand what was going
        wrong. The better solution would have been to document my thought processes
        and changes as I was doing that and attach it to the JIRA (and create the
        JIRA in the first place). I'll be better about it in the future.

        If the examples are not presented in sorted order to the training algorithm,
        it should work to compute accuracy during the iteration. At least, that is
        how it has generally worked out when I've implemented perceptrons in the
        past in other contexts. So, I think there is just an interaction with how
        contexts are collapsed and sorted in the maxent code.


        Jason Baldridge
        Assistant Professor, Department of Linguistics
        The University of Texas at Austin
        http://www.jasonbaldridge.com
        http://twitter.com/jasonbaldridge

        Show
        Jason Baldridge added a comment - 2011/4/26 Jörn Kottmann (JIRA) <jira@apache.org> Sorry – will keep that in mind in the future. I sort of did it while I was going through the code trying to simplify and understand what was going wrong. The better solution would have been to document my thought processes and changes as I was doing that and attach it to the JIRA (and create the JIRA in the first place). I'll be better about it in the future. If the examples are not presented in sorted order to the training algorithm, it should work to compute accuracy during the iteration. At least, that is how it has generally worked out when I've implemented perceptrons in the past in other contexts. So, I think there is just an interaction with how contexts are collapsed and sorted in the maxent code. – Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
        Hide
        Jason Baldridge added a comment -

        There were some problems with the implementation, both in terms of the original one and some of the changes I had made to address this issue. Code commit of a extensive rewrite of the perceptron to follow, with copious notes describing the changes.

        Show
        Jason Baldridge added a comment - There were some problems with the implementation, both in terms of the original one and some of the changes I had made to address this issue. Code commit of a extensive rewrite of the perceptron to follow, with copious notes describing the changes.
        Hide
        Jason Baldridge added a comment -

        Here are my notes on this issue. Also, see the comments in PerceptronTraininer.java. Please try out the perceptron on your datasets and let me know how it goes! -Jason

        • Changed the update to be the actual perceptron update: when a label
          that is not the gold label is chosen for an event, the parameters
          associated with that label are decremented, and the parameters
          associated with the gold label are incremented. I checked this
          empirically on several datasets, and it works better than the
          previous update (and it involves fewer updates).
        • stepsize is decreased by stepsize/1.05 on every iteration, ensuring
          better stability toward the end of training. This is actually the
          main reason that the training set accuracy obtained during parameter
          update continued to be different from that computed when parameters
          aren't updated. Now, the parameters don't jump as much in later
          iterations, so things settle down and those two accuracies converge
          if enough iterations are allowed.
        • Training set accuracy is computed once per iteration.
        • Training stops if the current training set accuracy changes less
          than a given tolerance from the accuracies obtained in each of the
          previous three iterations.
        • Averaging is done differently than before. Rather than doing an
          immediate update, parameters are simply accumulated after iterations
          (this makes the code much easier to understand/maintain). Also, not
          every iteration is used, as this tends to give to much weight to the
          final iterations, which don't actually differ that much from one
          another. I tried a few things and found a simple method that works
          well: sum the parameters from the first 20 iterations and then sum
          parameters from any further iterations that are perfect squares (25,
          36, 49, etc). This gets a good (diverse) sample of parameters for
          averaging since the distance between subsequent parameter sets gets
          larger as the number of iterations gets bigger.
        • Added prepositional phrase attachment dataset to
          src/test/resources/data/ppa. This is done with permission from
          Adwait Ratnarparkhi – see the README for details.
        • Created unit test to check perceptron training consistency, using
          the prepositional phrase attachment data. It would be good to do the
          same for maxent.
        • Added ListEventStream to make a stream out of List<Event>
        • Added some helper methods, e.g. maxIndex, to simplify the code in
          the main algorithm.
        • The training stats aren't shown for every iteration. Now it is just
          the first 10 and then every 10th iteration after that.
        • modelDistribution, params, evalParams and others are no longer class
          variables. They have been pushed into the findParameters
          method. Other variables could/should be made non-global too, but
          leaving as is for now.
        Show
        Jason Baldridge added a comment - Here are my notes on this issue. Also, see the comments in PerceptronTraininer.java. Please try out the perceptron on your datasets and let me know how it goes! -Jason Changed the update to be the actual perceptron update: when a label that is not the gold label is chosen for an event, the parameters associated with that label are decremented, and the parameters associated with the gold label are incremented. I checked this empirically on several datasets, and it works better than the previous update (and it involves fewer updates). stepsize is decreased by stepsize/1.05 on every iteration, ensuring better stability toward the end of training. This is actually the main reason that the training set accuracy obtained during parameter update continued to be different from that computed when parameters aren't updated. Now, the parameters don't jump as much in later iterations, so things settle down and those two accuracies converge if enough iterations are allowed. Training set accuracy is computed once per iteration. Training stops if the current training set accuracy changes less than a given tolerance from the accuracies obtained in each of the previous three iterations. Averaging is done differently than before. Rather than doing an immediate update, parameters are simply accumulated after iterations (this makes the code much easier to understand/maintain). Also, not every iteration is used, as this tends to give to much weight to the final iterations, which don't actually differ that much from one another. I tried a few things and found a simple method that works well: sum the parameters from the first 20 iterations and then sum parameters from any further iterations that are perfect squares (25, 36, 49, etc). This gets a good (diverse) sample of parameters for averaging since the distance between subsequent parameter sets gets larger as the number of iterations gets bigger. Added prepositional phrase attachment dataset to src/test/resources/data/ppa. This is done with permission from Adwait Ratnarparkhi – see the README for details. Created unit test to check perceptron training consistency, using the prepositional phrase attachment data. It would be good to do the same for maxent. Added ListEventStream to make a stream out of List<Event> Added some helper methods, e.g. maxIndex, to simplify the code in the main algorithm. The training stats aren't shown for every iteration. Now it is just the first 10 and then every 10th iteration after that. modelDistribution, params, evalParams and others are no longer class variables. They have been pushed into the findParameters method. Other variables could/should be made non-global too, but leaving as is for now.
        Hide
        Jason Baldridge added a comment -

        Oops, should have used this to post my notes. Anyway, issue is resolved. Here are the notes again. -Jason

        • Changed the update to be the actual perceptron update: when a label
          that is not the gold label is chosen for an event, the parameters
          associated with that label are decremented, and the parameters
          associated with the gold label are incremented. I checked this
          empirically on several datasets, and it works better than the
          previous update (and it involves fewer updates).
        • stepsize is decreased by stepsize/1.05 on every iteration, ensuring
          better stability toward the end of training. This is actually the
          main reason that the training set accuracy obtained during parameter
          update continued to be different from that computed when parameters
          aren't updated. Now, the parameters don't jump as much in later
          iterations, so things settle down and those two accuracies converge
          if enough iterations are allowed.
        • Training set accuracy is computed once per iteration.
        • Training stops if the current training set accuracy changes less
          than a given tolerance from the accuracies obtained in each of the
          previous three iterations.
        • Averaging is done differently than before. Rather than doing an
          immediate update, parameters are simply accumulated after iterations
          (this makes the code much easier to understand/maintain). Also, not
          every iteration is used, as this tends to give to much weight to the
          final iterations, which don't actually differ that much from one
          another. I tried a few things and found a simple method that works
          well: sum the parameters from the first 20 iterations and then sum
          parameters from any further iterations that are perfect squares (25,
          36, 49, etc). This gets a good (diverse) sample of parameters for
          averaging since the distance between subsequent parameter sets gets
          larger as the number of iterations gets bigger.
        • Added prepositional phrase attachment dataset to
          src/test/resources/data/ppa. This is done with permission from
          Adwait Ratnarparkhi – see the README for details.
        • Created unit test to check perceptron training consistency, using
          the prepositional phrase attachment data. It would be good to do the
          same for maxent.
        • Added ListEventStream to make a stream out of List<Event>
        • Added some helper methods, e.g. maxIndex, to simplify the code in
          the main algorithm.
        • The training stats aren't shown for every iteration. Now it is just
          the first 10 and then every 10th iteration after that.
        • modelDistribution, params, evalParams and others are no longer class
          variables. They have been pushed into the findParameters
          method. Other variables could/should be made non-global too, but
          leaving as is for now.
        Show
        Jason Baldridge added a comment - Oops, should have used this to post my notes. Anyway, issue is resolved. Here are the notes again. -Jason Changed the update to be the actual perceptron update: when a label that is not the gold label is chosen for an event, the parameters associated with that label are decremented, and the parameters associated with the gold label are incremented. I checked this empirically on several datasets, and it works better than the previous update (and it involves fewer updates). stepsize is decreased by stepsize/1.05 on every iteration, ensuring better stability toward the end of training. This is actually the main reason that the training set accuracy obtained during parameter update continued to be different from that computed when parameters aren't updated. Now, the parameters don't jump as much in later iterations, so things settle down and those two accuracies converge if enough iterations are allowed. Training set accuracy is computed once per iteration. Training stops if the current training set accuracy changes less than a given tolerance from the accuracies obtained in each of the previous three iterations. Averaging is done differently than before. Rather than doing an immediate update, parameters are simply accumulated after iterations (this makes the code much easier to understand/maintain). Also, not every iteration is used, as this tends to give to much weight to the final iterations, which don't actually differ that much from one another. I tried a few things and found a simple method that works well: sum the parameters from the first 20 iterations and then sum parameters from any further iterations that are perfect squares (25, 36, 49, etc). This gets a good (diverse) sample of parameters for averaging since the distance between subsequent parameter sets gets larger as the number of iterations gets bigger. Added prepositional phrase attachment dataset to src/test/resources/data/ppa. This is done with permission from Adwait Ratnarparkhi – see the README for details. Created unit test to check perceptron training consistency, using the prepositional phrase attachment data. It would be good to do the same for maxent. Added ListEventStream to make a stream out of List<Event> Added some helper methods, e.g. maxIndex, to simplify the code in the main algorithm. The training stats aren't shown for every iteration. Now it is just the first 10 and then every 10th iteration after that. modelDistribution, params, evalParams and others are no longer class variables. They have been pushed into the findParameters method. Other variables could/should be made non-global too, but leaving as is for now.
        Hide
        Joern Kottmann added a comment -

        Refactoring changes have been moved to a new jira OPENNLP-199.

        Show
        Joern Kottmann added a comment - Refactoring changes have been moved to a new jira OPENNLP-199 .

          People

          • Assignee:
            Jason Baldridge
            Reporter:
            Jason Baldridge
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:
              Resolved:

              Development