Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-399

LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.3, 0.4, 0.5
    • Fix Version/s: 0.7
    • Component/s: Classification
    • Labels:
    • Environment:

      Mac OS X 10.6.2, Hadoop 0.20.2, Mahout 0.3.

      Description

      Hello,

      Apologies if I have not labeled this correctly.

      I have run a toy problem on Mahout 0.3 (locally) for LDA that I used to test Blei's c version of LDA that he posts on his site. It has an exact solution that the LDA should converge to. Please see attached PDF that describes the intended output.

      Is LDA working? The following output indicates some sort of collapsing behavior to me.

      T0 T1 T2 T3 T4
      x w x u x
      u u g j n
      l r i m l
      j q h h p
      v p e i q
      e t f g v
      d s d f o
      b c b n k
      y f c l m
      w v u v u
      c d p y t
      k o l r r
      i b j k j
      f e k e f
      g x y s y
      t y w b w
      h i s p s
      o l v x d
      q j t d i
      n k o t b

      The intended output is (again, please see attached):

      D I N S X
      d i n s x
      c h m t y
      e j o r w
      b k l u v
      f g p q a
      a f k p b
      g l q v u
      h m j w t
      y u r o c
      n s d d i
      s e x f f
      r q i i n
      m v w c o
      o w u a h
      q n s h g
      p t c x d
      t x f e l
      x d e j s
      w y g b j
      i r y n r
      u o h y m
      k b t l e
      v c a m k
      j a b g p
      l p v k q

      What tests do you run to make sure the output is correct?

      Thank you,
      Mike.

        Attachments

        1. MAHOUT-399.diff
          13 kB
          Jake Mannix
        2. 1000docs_26terms_5topics.jpg
          52 kB
          Jake Mannix
        3. olt.tar
          9.77 MB
          Michael Lazarus
        4. Overlapping Pyramids Toy Dataset.pdf
          936 kB
          Michael Lazarus

          Issue Links

            Activity

              People

              • Assignee:
                jake.mannix Jake Mannix
                Reporter:
                mikelazarus Michael Lazarus
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved: