Mahout
  1. Mahout
  2. MAHOUT-944

LuceneIndexToSequenceFiles (lucene2seq) utility

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.8
    • Component/s: Integration
    • Labels:
      None

      Description

      Here is a lucene2seq tool I used in a project. It creates sequence files based on the stored fields of a lucene index.

      The output from this tool can be then fed into seq2sparse and from there you can do text clustering.

      Comes with Java bean configuration.

      Let me know what you think. Some CLI code can be added later on. I used this for a small-scale project +- 100.000 docs. Is a MR version useful or is that overkill?

      See https://github.com/frankscholten/mahout/tree/lucene2seq for commits and review comments from Simon Willnauer (Thanks Simon!)

      or the attached patch.

      1. MAHOUT-944-minor.patch
        69 kB
        Grant Ingersoll
      2. MAHOUT-944.patch
        20 kB
        Frank Scholten
      3. MAHOUT-944.patch
        53 kB
        Frank Scholten
      4. MAHOUT-944.patch
        39 kB
        Frank Scholten
      5. MAHOUT-944.patch
        39 kB
        Frank Scholten
      6. MAHOUT-944.patch
        86 kB
        Frank Scholten
      7. MAHOUT-944.patch
        377 kB
        Frank Scholten
      8. MAHOUT-944.patch
        85 kB
        Grant Ingersoll
      9. MAHOUT-944.patch
        82 kB
        Grant Ingersoll
      10. MAHOUT-944.patch
        81 kB
        Grant Ingersoll
      11. MAHOUT-944.patch
        81 kB
        Grant Ingersoll
      12. MAHOUT-944.patch
        86 kB
        Grant Ingersoll
      13. MAHOUT-944.patch
        91 kB
        Grant Ingersoll

        Activity

        Frank Scholten created issue -
        Frank Scholten made changes -
        Field Original Value New Value
        Attachment MAHOUT-944.patch [ 12510170 ]
        Frank Scholten made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.5 [ 12315255 ]
        Affects Version/s 0.7 [ 12319261 ]
        Fix Version/s 0.7 [ 12319261 ]
        Fix Version/s 0.5 [ 12315255 ]
        Fix Version/s 0.6 [ 12316364 ]
        Frank Scholten made changes -
        Attachment MAHOUT-944.patch [ 12514122 ]
        Frank Scholten made changes -
        Attachment MAHOUT-944.patch [ 12514124 ]
        Frank Scholten made changes -
        Attachment MAHOUT-944.patch [ 12514209 ]
        Grant Ingersoll made changes -
        Assignee Grant Ingersoll [ gsingers ]
        Frank Scholten made changes -
        Attachment MAHOUT-944.patch [ 12516299 ]
        Frank Scholten made changes -
        Attachment MAHOUT-944-a.patch [ 12516678 ]
        Frank Scholten made changes -
        Attachment MAHOUT-944-b.patch [ 12516683 ]
        Frank Scholten made changes -
        Attachment MAHOUT-944.patch [ 12517056 ]
        Frank Scholten made changes -
        Attachment MAHOUT-944-b.patch [ 12516683 ]
        Frank Scholten made changes -
        Attachment MAHOUT-944-a.patch [ 12516678 ]
        Grant Ingersoll made changes -
        Fix Version/s 0.8 [ 12320153 ]
        Fix Version/s 0.7 [ 12319261 ]
        Grant Ingersoll made changes -
        Attachment MAHOUT-944.patch [ 12585756 ]
        Grant Ingersoll made changes -
        Attachment MAHOUT-944.patch [ 12585760 ]
        Grant Ingersoll made changes -
        Attachment MAHOUT-944.patch [ 12585762 ]
        Grant Ingersoll made changes -
        Attachment MAHOUT-944.patch [ 12585764 ]
        Grant Ingersoll made changes -
        Attachment MAHOUT-944.patch [ 12586401 ]
        Grant Ingersoll made changes -
        Attachment MAHOUT-944.patch [ 12586507 ]
        Grant Ingersoll made changes -
        Comment [ I'll let it sit for a day or two and then commit. ]
        Grant Ingersoll made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Grant Ingersoll made changes -
        Attachment MAHOUT-944-minor.patch [ 12586531 ]
        Suneel Marthi made changes -
        Comment [ I agree, the whole thing's messed up now. :) ]
        Suneel Marthi made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Frank Scholten
          • Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development