Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-944

LuceneIndexToSequenceFiles (lucene2seq) utility

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.5
    • 0.8
    • classic
    • None

    Description

      Here is a lucene2seq tool I used in a project. It creates sequence files based on the stored fields of a lucene index.

      The output from this tool can be then fed into seq2sparse and from there you can do text clustering.

      Comes with Java bean configuration.

      Let me know what you think. Some CLI code can be added later on. I used this for a small-scale project +- 100.000 docs. Is a MR version useful or is that overkill?

      See https://github.com/frankscholten/mahout/tree/lucene2seq for commits and review comments from Simon Willnauer (Thanks Simon!)

      or the attached patch.

      Attachments

        1. MAHOUT-944.patch
          91 kB
          Grant Ingersoll
        2. MAHOUT-944.patch
          86 kB
          Grant Ingersoll
        3. MAHOUT-944.patch
          81 kB
          Grant Ingersoll
        4. MAHOUT-944.patch
          81 kB
          Grant Ingersoll
        5. MAHOUT-944.patch
          82 kB
          Grant Ingersoll
        6. MAHOUT-944.patch
          85 kB
          Grant Ingersoll
        7. MAHOUT-944.patch
          377 kB
          Frank Scholten
        8. MAHOUT-944.patch
          86 kB
          Frank Scholten
        9. MAHOUT-944.patch
          39 kB
          Frank Scholten
        10. MAHOUT-944.patch
          39 kB
          Frank Scholten
        11. MAHOUT-944.patch
          53 kB
          Frank Scholten
        12. MAHOUT-944.patch
          20 kB
          Frank Scholten
        13. MAHOUT-944-minor.patch
          69 kB
          Grant Ingersoll

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gsingers Grant Ingersoll
            frankscholten Frank Scholten
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment