[S2GRAPH-206] Generalize machine learning model serving. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Done
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Component/s: s2core
Labels:
None

Description

One of the top use cases of OLTP graph database is the recommendation(arguably).

Let's see how item-based collaborative filtering(item-based CF) can be served as graph query.

fetch user's history as the edges of clicked items.
fetch each item's similar items.

There are few problems with above naive approach since we need to insert many item pairs as edges(N^2 where N is the total number of items).

Even though bulk load can update a large number of edges in a stable manner, the user needs to generate similarity matrix, which is often very large.

Also above approach does not generalize other model-based approaches.

For example, the user wants to use matrix factorization, need to work on following steps.

dump user's history in raw records.
convert user history to the matrix by creating dictionary map between raw value and sequence.
factorize user history, usually using Alternating least squares (ALS) which yields factorized model U, I.
run k nearest neighbor per each item on I, which yield an array of item sequence per each item sequence.
convert item sequence an array of similar item sequence back to an item array of the similar item by using dictionary created from 2.
bulk load item-item similarity as edges.

Note that these steps become tedious.

I think above steps can be changed into following if S2Graph support the more generalized way to support serving machine learning model.

1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be automated.

To automate 4,5,6, we need to provide ways to load ML models from the remote location and integrate pre-loaded ML model into graph query structure.

So logically, the original query should be changed into following.

fetch user's history as the edge of clicked items.
convert clicked items into item sequences.
run the k-nearest-neighbor search on pre-loaded ML model and get an array of similar item sequence.
convert an array of similar item sequence into an array of the similar item using pre-loaded ML model's dictionary.

One might argue that supporting machine learning serving is not S2Graph's focus.

The reason behind this suggestion is that I believe providing a unified interface to traverse not only pre-stored data as vertex/edge, but also model generated data on the fly as vertex/edge can be very useful (not only for collaborative filtering use cases).

Attachments

Issue Links

links to

GitHub Pull Request #162

Sub-Tasks

Add REAME for movielens examples

Done

Do Yung Yoon

Activity

People

Assignee:: Do Yung Yoon

Reporter:: Do Yung Yoon

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Due:: 30/Apr/18

Created:: 11/Apr/18 23:33

Updated:: 14/May/18 12:30

Resolved:: 14/May/18 12:30

Time Tracking

Estimated:

672h

Remaining:

672h

Logged:

Not Specified

Include sub-tasks