Mahout
  1. Mahout
  2. MAHOUT-399

LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.3, 0.4, 0.5
    • Fix Version/s: 0.7
    • Component/s: Classification
    • Labels:
    • Environment:

      Mac OS X 10.6.2, Hadoop 0.20.2, Mahout 0.3.

      Description

      Hello,

      Apologies if I have not labeled this correctly.

      I have run a toy problem on Mahout 0.3 (locally) for LDA that I used to test Blei's c version of LDA that he posts on his site. It has an exact solution that the LDA should converge to. Please see attached PDF that describes the intended output.

      Is LDA working? The following output indicates some sort of collapsing behavior to me.

      T0 T1 T2 T3 T4
      x w x u x
      u u g j n
      l r i m l
      j q h h p
      v p e i q
      e t f g v
      d s d f o
      b c b n k
      y f c l m
      w v u v u
      c d p y t
      k o l r r
      i b j k j
      f e k e f
      g x y s y
      t y w b w
      h i s p s
      o l v x d
      q j t d i
      n k o t b

      The intended output is (again, please see attached):

      D I N S X
      d i n s x
      c h m t y
      e j o r w
      b k l u v
      f g p q a
      a f k p b
      g l q v u
      h m j w t
      y u r o c
      n s d d i
      s e x f f
      r q i i n
      m v w c o
      o w u a h
      q n s h g
      p t c x d
      t x f e l
      x d e j s
      w y g b j
      i r y n r
      u o h y m
      k b t l e
      v c a m k
      j a b g p
      l p v k q

      What tests do you run to make sure the output is correct?

      Thank you,
      Mike.

      1. MAHOUT-399.diff
        13 kB
        Jake Mannix
      2. 1000docs_26terms_5topics.jpg
        52 kB
        Jake Mannix
      3. olt.tar
        9.77 MB
        Michael Lazarus
      4. Overlapping Pyramids Toy Dataset.pdf
        936 kB
        Michael Lazarus

        Issue Links

          Activity

            People

            • Assignee:
              Jake Mannix
              Reporter:
              Michael Lazarus
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Development