[MAHOUT-1069] Multi-target, side-info aware, SGD-based recommender algorithms, examples, and tools to run - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 0.1
Fix Version/s: None
Component/s: classic
Labels:
- cf
- improvement
- sgd

Description

Upon our conversations on dev-list, I would like to state that I have completed the merge of the recommender algorithms that is mentioned in http://goo.gl/fh4d9 to mahout.

These are a set of learning algorithms for matrix factorization based recommendation, which are capable of:

Recommending multiple targets:
1. Numerical Recommendation with OLS Regression
2. Binary Recommendation with Logistic Regression
3. Multinomial Recommendation with Softmax Regression
4. Ordinal Recommendation with Proportional Odds Model

Leveraging side info in mahout vector format where available
1. User side information
2. Item side information
3. Dynamic side information (side info at feedback moment, such as proximity, day of week etc.)

Online learning

Some command-line tools are provided as mahout jobs, for pre-experiment utilities and running experiments.

Evaluation tools for numerical and categorical recommenders are added.

A simple example for Movielens-1M data is provided, and it achieved pretty good results (0.851 RMSE in a randomly generated test data after some validation to determine learning and regularization rates on a separate validation data)

There is no modification in the existing Mahout code, except the added lines in driver.class.props for command-line tools. However, that became a huge patch with dozens of new source files.

These algorithms are highly inspired from various influential Recommender System papers, especially Yehuda Koren's. For example, the Ordinal model is from Koren's OrdRec paper, except the cuts are not user-specific but global.

Left for future:

The core algorithms are tested, but there probably exists some parts those tests do not cover. I saw many of those in action without problem, but I am going to add new tests regularly.
Not all algorithms have been tried on appropriate datasets, and they may need some improvement. However, I use the algorithms also for my M.Sc. thesis, which means I will eventually submit more experiments. As the experimenting infrastructure exists, I believe community may provide more experiments, too.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAHOUT-1069.patch
18/Sep/12 12:22
268 kB
Gokhan Capan
MAHOUT-1069.patch
11/Oct/12 13:20
269 kB
Gokhan Capan

Activity

People

Assignee:: Unassigned

Reporter:: Gokhan Capan

Votes:: 3 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 18/Sep/12 12:19

Updated:: 31/Jan/24 22:14

Resolved:: 11/Mar/13 16:28

Time Tracking

Estimated:

168h

Remaining:

168h

Logged:

Not Specified