[MAHOUT-906] Allow collaborative filtering evaluators to use custom logic in splitting data set - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.5
Fix Version/s: 0.6
Component/s: None
Labels:
- features

Description

I want to start a discussion about factoring out the logic used in splitting the data set into training and testing. Here is how things stand: There are two independent evaluator based classes: AbstractDifferenceRecommenderEvaluator, splits all the preferences randomly into a training and testing set. GenericRecommenderIRStatsEvaluator takes one user at a time, removes their top AT preferences, and counts how many of them the system recommends back.

I have two use cases that both deal with temporal dynamics. In one case, there may be expired items that can be used for building a training model, but not a test model. In the other, I may want to simulate the behavior of a real system by building a preference matrix on days 1-k, and testing on the ratings the user generated on the day k+1. In this case, it's not items, but preferences(user, item, rating triplets) which may belong only to the training set. Before we discuss appropriate design, are there any other use cases we need to keep in mind?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAHOUT-906.patch
15/Dec/11 11:21
8 kB
Anatoliy Kats
MAHOUT-906.patch
15/Dec/11 13:04
14 kB
Sean R. Owen
MAHOUT-906.patch
16/Dec/11 08:17
8 kB
Anatoliy Kats
MAHOUT-906.patch
16/Dec/11 08:31
8 kB
Anatoliy Kats
MAHOUT-906.patch
20/Dec/11 13:58
14 kB
Anatoliy Kats

Activity

People

Assignee:: Sean R. Owen

Reporter:: Anatoliy Kats

Votes:: 1 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 02/Dec/11 11:27

Updated:: 31/Jan/24 22:11

Resolved:: 29/Dec/11 20:56

Time Tracking

Estimated:

48h

Remaining:

48h

Logged:

Not Specified