Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1286

Memory-efficient DataModel, supporting fast online updates and element-wise iteration

    XMLWordPrintableJSON

    Details

      Description

      Most DataModel implementation in current CF component use hash map to enable fast 2d indexing and update. This is not memory-efficient for big data set. e.g. Netflix prize dataset takes 11G heap space as a FileDataModel.

      Improved implementation of DataModel should use more compact data structure (like arrays), this can trade a little of time complexity in 2d indexing for vast improvement in memory efficiency. In addition, any online recommender or online-to-batch converted recommender will not be affected by this in training process.

        Attachments

        1. benchmark.patch
          29 kB
          Gokhan Capan
        2. InMemoryDataModel.java
          24 kB
          Peng Cheng
        3. InMemoryDataModelTest.java
          6 kB
          Peng Cheng
        4. Semifinal-implementation-added.patch
          32 kB
          Peng Cheng

          Issue Links

            Activity

              People

              • Assignee:
                smarthi Suneel Marthi
                Reporter:
                peng Peng Cheng
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 336h
                  336h
                  Remaining:
                  Remaining Estimate - 336h
                  336h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified