Mahout
  1. Mahout
  2. MAHOUT-678

NullPointerException while using MixedGradient with SGD algorithm

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Component/s: Classification
    • Labels:
      None

      Description

      I am trying to use the MixedGradient in OnlineLogisticRegression algorithm. But I will get an NullPointerException randomly if I set the alpha larger than 0.

      I checked the code and found that in the RankingGradient used by MixedGradient, it assume that the target category should be only 2, rather than multiple. And the rank gradient should only be used once the Gradient object knew both the positive and negative targets. I created a simple patch to make it workable, but I am not really understand the MixedGradient method deeply, please check the patch carefully to see if it really works correctly.

        Activity

        Hide
        Stanley Xu added a comment -

        This is a quick and dirty fix. I thought the real fix should also consider the performance as well since the SGD algorithm is really fast.

        Show
        Stanley Xu added a comment - This is a quick and dirty fix. I thought the real fix should also consider the performance as well since the SGD algorithm is really fast.
        Hide
        Sean Owen added a comment -

        Tiny suggestion: I bet you'll find that "hasOne = actual == 1;" is actually very marginally faster than the if statement (avoids a branch) and is a bit more compact. I also dont know enough to say whether this is the right change.

        Show
        Sean Owen added a comment - Tiny suggestion: I bet you'll find that "hasOne = actual == 1;" is actually very marginally faster than the if statement (avoids a branch) and is a bit more compact. I also dont know enough to say whether this is the right change.
        Hide
        Sean Owen added a comment -

        Er I mean 'hasOne ¦= actual == 1' and that may be too clever.

        Show
        Sean Owen added a comment - Er I mean 'hasOne ¦= actual == 1' and that may be too clever.
        Hide
        Ted Dunning added a comment -

        This needs fixing on at least two levels. First, the documentation should describe why only the binomial case makes sense for ranking and secondly, the code should detect mis-use and complain.

        Show
        Ted Dunning added a comment - This needs fixing on at least two levels. First, the documentation should describe why only the binomial case makes sense for ranking and secondly, the code should detect mis-use and complain.
        Hide
        Sean Owen added a comment -

        OK, I committed a variant of this patch to get this moving forward.

        Show
        Sean Owen added a comment - OK, I committed a variant of this patch to get this moving forward.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #996 (See https://builds.apache.org/job/Mahout-Quality/996/)
        MAHOUT-678 commit reasonable band-aid proposed in the patch to resolve the proximate issue: no more NPE when not running on binomial data

        srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1160069
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/MixedGradient.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #996 (See https://builds.apache.org/job/Mahout-Quality/996/ ) MAHOUT-678 commit reasonable band-aid proposed in the patch to resolve the proximate issue: no more NPE when not running on binomial data srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1160069 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/MixedGradient.java

          People

          • Assignee:
            Ted Dunning
            Reporter:
            Stanley Xu
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development