Mahout
  1. Mahout
  2. MAHOUT-799

Cannot run SequenceFilesFromCsvFilter, ever

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Component/s: Examples
    • Labels:
      None

      Description

      As described here:

      http://mail-archives.apache.org/mod_mbox/mahout-user/201106.mbox/%3C4DED5DCD.6050107@gmail.com%3E

      SequenceFilesFromCsvFilter cannot be invoked with default parameter values, because it dies like so:

      bin/mahout seqdirectory -i input -o output -filter
      org.apache.mahout.text.SequenceFilesFromCsvFilter

      ...
      Caused by: java.lang.NumberFormatException: null
      at java.lang.Integer.parseInt(Integer.java:417)
      at java.lang.Integer.parseInt(Integer.java:499)
      at org.apache.mahout.text.SequenceFilesFromCsvFilter.<init>(SequenceFilesFromCsvFilter.java:56)

      If one adds the parameters -kcol 0 -vcol 0 (or their long-form versions), it dies like so:

      Unexpected -kcol while processing Job-Specific Options

      Commenting out SequenceFilesFromCsvFilter:56 and SequenceFilesFromCsvFilter:57, like so, allows the run to proceed

      // this.keyColumn = Integer.parseInt(options.get(KEY_COLUMN_OPTION[0]));
      // this.valueColumn = Integer.parseInt(options.get(VALUE_COLUMN_OPTION[0]));

      1. MAHOUT-799.patch
        5 kB
        Sean Owen
      2. MAHOUT-799.patch
        17 kB
        Sean Owen

        Activity

        Hide
        Sean Owen added a comment -

        I'm also confused, reading this. SequenceFilesFromCsvFilter works when run as a command-line program. But when used this way it never adds its options to the command line and can't work. Was this the intent of the design? seems like there needs to be additional cross-wiring for these filters to participate in the command line.

        Show
        Sean Owen added a comment - I'm also confused, reading this. SequenceFilesFromCsvFilter works when run as a command-line program. But when used this way it never adds its options to the command line and can't work. Was this the intent of the design? seems like there needs to be additional cross-wiring for these filters to participate in the command line.
        Hide
        Jack Tanner added a comment -

        To avoid having to build the cross-wiring, you could just detect this execution pattern and exit with a message that explains the proper command-line use.

        Which begs the question, how does one run it correctly from the command line?

        Show
        Jack Tanner added a comment - To avoid having to build the cross-wiring, you could just detect this execution pattern and exit with a message that explains the proper command-line use. Which begs the question, how does one run it correctly from the command line?
        Hide
        Sean Owen added a comment -

        Right now there is no proper command-line use it seems. I don't know what was intended here. Who wrote this bit? not clear from the SVN logs.

        Show
        Sean Owen added a comment - Right now there is no proper command-line use it seems. I don't know what was intended here. Who wrote this bit? not clear from the SVN logs.
        Hide
        Sean Owen added a comment -

        Hmm, the author didn't follow up. As far as I can tell, the -filter option should never have been added. The only subclass was not written to work as an 'argument', but only as a command-line program. My best fix is to just remove it. You can use this, still, by running it directly as the command-line program.

        Show
        Sean Owen added a comment - Hmm, the author didn't follow up. As far as I can tell, the -filter option should never have been added. The only subclass was not written to work as an 'argument', but only as a command-line program. My best fix is to just remove it. You can use this, still, by running it directly as the command-line program.
        Hide
        Sean Owen added a comment -

        OK, different answer: I don't think the CSV filter can be 'saved'. I'm unable to make it work once I shake out the rest of the knock-on issues, as it's currently designed. Instead of removing -filter, I think we should just remove this implementation, tidy up a bit, and leave the integration point for someone to try again. Here's a new patch that removes it and tidies up instead. I don't actually think it's controversial since 1) it doesn't work now and 2) didn't actually read CSV data to begin with!

        Show
        Sean Owen added a comment - OK, different answer: I don't think the CSV filter can be 'saved'. I'm unable to make it work once I shake out the rest of the knock-on issues, as it's currently designed. Instead of removing -filter, I think we should just remove this implementation, tidy up a bit, and leave the integration point for someone to try again. Here's a new patch that removes it and tidies up instead. I don't actually think it's controversial since 1) it doesn't work now and 2) didn't actually read CSV data to begin with!
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1068 (See https://builds.apache.org/job/Mahout-Quality/1068/)
        MAHOUT-799 remove CSV filter that wasn't working

        srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177027
        Files :

        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/PrefixAdditionFilter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromCsvFilter.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectory.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectoryFilter.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/text/TestSequenceFilesFromDirectory.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1068 (See https://builds.apache.org/job/Mahout-Quality/1068/ ) MAHOUT-799 remove CSV filter that wasn't working srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177027 Files : /mahout/trunk/integration/src/main/java/org/apache/mahout/text/PrefixAdditionFilter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromCsvFilter.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectory.java /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectoryFilter.java /mahout/trunk/integration/src/test/java/org/apache/mahout/text/TestSequenceFilesFromDirectory.java

          People

          • Assignee:
            Sean Owen
            Reporter:
            Jack Tanner
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development