Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-61

Text problem matrix builder

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Later
    • None
    • None
    • None
    • None

    Description

      A set of classes that builds matrices from text.

      Currently the API consists of TokenMatrixBuilder and TokenInstanceBuilder. Should be thread safe.

      PostReader imports 20news-bydate. This takes several GB heap. It would be nice to bounce the data via JDBM or perhaps using the PersistentHashMap in MAHOUT-19.

      Attachments

        1. MAHOUT-61.txt
          27 kB
          Karl Wettin
        2. MAHOUT-61.txt
          44 kB
          Karl Wettin
        3. MAHOUT-61.txt
          64 kB
          Karl Wettin

        Issue Links

          Activity

            People

              karl.wettin Karl Wettin
              karl.wettin Karl Wettin
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: