Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1286

Memory-efficient DataModel, supporting fast online updates and element-wise iteration

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      Most DataModel implementation in current CF component use hash map to enable fast 2d indexing and update. This is not memory-efficient for big data set. e.g. Netflix prize dataset takes 11G heap space as a FileDataModel.

      Improved implementation of DataModel should use more compact data structure (like arrays), this can trade a little of time complexity in 2d indexing for vast improvement in memory efficiency. In addition, any online recommender or online-to-batch converted recommender will not be affected by this in training process.

        Attachments

        1. InMemoryDataModel.java
          24 kB
          Peng Cheng
        2. InMemoryDataModelTest.java
          6 kB
          Peng Cheng
        3. Semifinal-implementation-added.patch
          32 kB
          Peng Cheng
        4. benchmark.patch
          29 kB
          Gokhan Capan

        Issue Links

          Activity

            People

            • Assignee:
              smarthi Suneel Marthi
              Reporter:
              peng Peng Cheng

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Issue deployment