Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1147

CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7
    • 0.8
    • classic
    • Eclipse IDE
      Java code base
      CVB0Driver Class
      setModelPaths(Job job, Path modelPath) - method

    Description

      Problem:
      When training doc/topic model no paths for the term/topic model found (outputs null).
      These paths are set using setModelPaths in CVB0Driver.

      Reason for Problem:
      Variety of Job instances call this method.
      The Job is passed to the method instead of the Configuration object given to the Job.
      The configuration is retrieved from the Job instance itself.
      I believe that this Configuration instance is a clone of the original.
      This is a problem as the variable MODEL_PATHS is set on the clone which is then discarded when the given Job is complete.
      The original Configuration has no MODEL_PATHS String set and therefore returns null.
      The code stipulates that if it cannot find a model to use a new random matrix. This happens every time as MODEL_PATHS is not set for the Configuration instance used.

      Solution:
      Do not pass the Job to the setModels method, but pass the Configuration instance passed into the method which created the Job.
      i.e.
      change from:
      setModelPaths(Job job, Path modelPath)

      to:
      setModelPaths(Configuration conf, Path modelPath)

      And change all calling methods accordingly (obviously).

      So far what little testing I have done appears to solve this problem.

      Attachments

        1. MAHOUT-1147.patch
          4 kB
          Grant Ingersoll
        2. MAHOUT-1147.patch
          4 kB
          Chenghao Liu

        Activity

          People

            jake.mannix Jake Mannix
            jp242@sussex.ac.uk Jack Pay
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified