Mahout
  1. Mahout
  2. MAHOUT-641

DistributedRowMatrix hadoop jobs ignore Configuration set via setConf()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.4, 0.5
    • Fix Version/s: 0.5
    • Component/s: Math
    • Environment:

      Mahout 0.4 and 0.5-SNAPSHOT run with Hadoop 0.20.2 on Mac OS 10.6 and Linux x86_64 2.6.18

      Description

      I am using the Distributed Lanczos solver which uses the DistributedRowMatrix class for it's internal calculation. In our environment, I need to set some Configuration properties (specifically hadoop.job.ugi & hadoop.queue.name), which are when set via the setConf() method on DistributedLanczosSolver. These are correctly passed to DistributedRowMatrix via its setConf() method, but are not passed into the Hadoop JobConfs created by the various static routines in MatrixMultiplicationJob, TimesSquaredJob, and TransposeJob.

      1. MAHOUT-641.patch
        19 kB
        Jonathan Traupman

        Activity

        Hide
        Ted Dunning added a comment -

        Nice work Jonathan!

        On Tue, Mar 29, 2011 at 11:42 AM, Jonathan Traupman (JIRA)

        Show
        Ted Dunning added a comment - Nice work Jonathan! On Tue, Mar 29, 2011 at 11:42 AM, Jonathan Traupman (JIRA)
        Hide
        Jonathan Traupman added a comment -

        Patch for MAHOUT-641. Diffed from revision 1086678.

        Show
        Jonathan Traupman added a comment - Patch for MAHOUT-641 . Diffed from revision 1086678.
        Hide
        Jonathan Traupman added a comment -

        Sorry for the duplicate message – still figuring out the patch submission process.

        Anyway, the patch is uploaded, all existing tests pass and I added 3 new cases for this specific bug. The patch fixes the problems I was having in our environment.

        Show
        Jonathan Traupman added a comment - Sorry for the duplicate message – still figuring out the patch submission process. Anyway, the patch is uploaded, all existing tests pass and I added 3 new cases for this specific bug. The patch fixes the problems I was having in our environment.
        Hide
        Sean Owen added a comment -

        I'll commit it. More broadly I think there is some inconsistency in the code about who passes a Configuration when, and a lot of code ignores or makes its own Configuration. Configuration is the context of Hadoop operations, and probably needs to be passed around most everywhere in Mahout (rather than be created anew). The code probably "gets away" with not doing it in several cases, but not this one.

        I cleaned up a small version of this in MAHOUT-633, having to do with obtaining a FileSystem to delete files properly. It's a broader issue and not one to solve right now perhaps.

        Show
        Sean Owen added a comment - I'll commit it. More broadly I think there is some inconsistency in the code about who passes a Configuration when, and a lot of code ignores or makes its own Configuration. Configuration is the context of Hadoop operations, and probably needs to be passed around most everywhere in Mahout (rather than be created anew). The code probably "gets away" with not doing it in several cases, but not this one. I cleaned up a small version of this in MAHOUT-633 , having to do with obtaining a FileSystem to delete files properly. It's a broader issue and not one to solve right now perhaps.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #708 (See https://hudson.apache.org/hudson/job/Mahout-Quality/708/)

        Show
        Hudson added a comment - Integrated in Mahout-Quality #708 (See https://hudson.apache.org/hudson/job/Mahout-Quality/708/ )

          People

          • Assignee:
            Unassigned
            Reporter:
            Jonathan Traupman
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development