Mahout
  1. Mahout
  2. MAHOUT-839

rowid job failing (when parsing options)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6
    • Component/s: None
    • Labels:

      Description

      Although MAHOUT-757 moved towards standard option naming, it uses different APIs for option parsing than other jobs.

      On my system, it died reliably with null pointer error. Reported in mail here, but not reconfirmed by anyone else yet: http://permalink.gmane.org/gmane.comp.apache.mahout.user/9659

      Example:

      TellyClub:bin danbri$ ./mahout rowid --help

      MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
      MAHOUT_LOCAL is set, running locally

      [skipping some hopefully unrelated SLF4J errors re same thing on classpath twice]

      Exception in thread "main" java.lang.NullPointerException
      at org.apache.hadoop.fs.Path.<init>(Path.java:61)
      at org.apache.hadoop.fs.Path.<init>(Path.java:50)
      at org.apache.mahout.utils.vectors.RowIdJob.run(RowIdJob.java:49)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      at org.apache.mahout.utils.vectors.RowIdJob.main(RowIdJob.java:89)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
      at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
      at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

      1. MAHOUT-839-v2.patch
        7 kB
        Dan Brickley
      2. MAHOUT-839.patch
        7 kB
        Dan Brickley

        Activity

        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1115 (See https://builds.apache.org/job/Mahout-Quality/1115/)
        MAHOUT-839 call parseArguments() to make sure all args are ready for the job to use

        srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188038
        Files :

        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/RowIdJob.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1115 (See https://builds.apache.org/job/Mahout-Quality/1115/ ) MAHOUT-839 call parseArguments() to make sure all args are ready for the job to use srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188038 Files : /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/RowIdJob.java
        Hide
        Grant Ingersoll added a comment -

        Map<String,String> parsedArgs = parseArguments(args);
        if (parsedArgs == null)

        Unknown macro: { return -1; }

        ...

        is the standard way of doing this.

        Show
        Grant Ingersoll added a comment - Map<String,String> parsedArgs = parseArguments(args); if (parsedArgs == null) Unknown macro: { return -1; } ... is the standard way of doing this.
        Hide
        Grant Ingersoll added a comment -

        I didn't run the code, but looking at it, yeah, it seems like it never parses the arguments, so it would def. through an NPE on line 49.

        Show
        Grant Ingersoll added a comment - I didn't run the code, but looking at it, yeah, it seems like it never parses the arguments, so it would def. through an NPE on line 49.
        Hide
        Dan Brickley added a comment -

        OK, should've asked first?

        It's working for me here now with the (wrong) fix. But are you confirming that it doesn't currently work for you at least?

        Show
        Dan Brickley added a comment - OK, should've asked first? It's working for me here now with the (wrong) fix. But are you confirming that it doesn't currently work for you at least?
        Hide
        Grant Ingersoll added a comment -

        Also, for future reference, no need to name patches as v2, etc. JIRA will always show the latest one when it has the same name, that makes it easier to find which one to use.

        Show
        Grant Ingersoll added a comment - Also, for future reference, no need to name patches as v2, etc. JIRA will always show the latest one when it has the same name, that makes it easier to find which one to use.
        Hide
        Grant Ingersoll added a comment -

        Hey Dan,

        I think the addInputOption, addOutputOption is the preferred way. I think the thing that is missing is the parseArguments option. See the KMeansDriver.

        Show
        Grant Ingersoll added a comment - Hey Dan, I think the addInputOption, addOutputOption is the preferred way. I think the thing that is missing is the parseArguments option. See the KMeansDriver.
        Hide
        Dan Brickley added a comment -

        now checks for --help and bails appropriately

        Show
        Dan Brickley added a comment - now checks for --help and bails appropriately
        Hide
        Dan Brickley added a comment -

        Doh, my apologies – as I said this isn't yet well tested. If you call with '--help' it fails:

        Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from a null string

        Still, I hope better to share work-in-progress here rather than let it rot on my laptop. Attaching a fix as MAHOUT-839-v2.patch

        Show
        Dan Brickley added a comment - Doh, my apologies – as I said this isn't yet well tested. If you call with '--help' it fails: Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from a null string Still, I hope better to share work-in-progress here rather than let it rot on my laptop. Attaching a fix as MAHOUT-839 -v2.patch
        Hide
        Dan Brickley added a comment -

        For integration/src/main/java/org/apache/mahout/utils/vectors/RowIdJob.java
        aka 'bin/mahout rowid'...

        Show
        Dan Brickley added a comment - For integration/src/main/java/org/apache/mahout/utils/vectors/RowIdJob.java aka 'bin/mahout rowid'...
        Hide
        Dan Brickley added a comment -

        I rewrote this based on one of the other drivers with more complex options. Attaching a patch, which got me working again but needs a careful check.

        In attempt to drift towards standard options, it also adds --overwrite, which in turn calls

        + HadoopUtil.delete(getConf(), outputDir);

        Please assume this will rm -rf all your most precious files and treat with appropriate caution.

        Also the input option tries to match smartly using 'PathType.LIST, PathFilters.partFilter()'... and the help text should be accurate afaik.

        Show
        Dan Brickley added a comment - I rewrote this based on one of the other drivers with more complex options. Attaching a patch, which got me working again but needs a careful check. In attempt to drift towards standard options, it also adds --overwrite, which in turn calls + HadoopUtil.delete(getConf(), outputDir); Please assume this will rm -rf all your most precious files and treat with appropriate caution. Also the input option tries to match smartly using 'PathType.LIST, PathFilters.partFilter()'... and the help text should be accurate afaik.

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Dan Brickley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development