[MAHOUT-1286] Memory-efficient DataModel, supporting fast online updates and element-wise iteration - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 0.9
Fix Version/s: 0.9
Component/s: None
Labels:

Description

Most DataModel implementation in current CF component use hash map to enable fast 2d indexing and update. This is not memory-efficient for big data set. e.g. Netflix prize dataset takes 11G heap space as a FileDataModel.

Improved implementation of DataModel should use more compact data structure (like arrays), this can trade a little of time complexity in 2d indexing for vast improvement in memory efficiency. In addition, any online recommender or online-to-batch converted recommender will not be affected by this in training process.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

InMemoryDataModel.java
12/Aug/13 16:18
24 kB
Peng Cheng
InMemoryDataModelTest.java
12/Aug/13 16:18
6 kB
Peng Cheng
Semifinal-implementation-added.patch
30/Aug/13 01:20
32 kB
Peng Cheng
benchmark.patch
26/Oct/13 14:13
29 kB
Gokhan Capan

Issue Links

blocks

MAHOUT-1274 SGD-based Online SVD recommender

Closed

Activity

People

Assignee:: Suneel Marthi

Reporter:: Peng Cheng

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Jul/13 21:57

Updated:: 31/Jan/24 22:11

Resolved:: 03/Dec/13 03:44

Time Tracking

Estimated:

336h

Remaining:

336h

Logged:

Not Specified