Uploaded image for project: 'S2Graph'
  1. S2Graph
  2. S2GRAPH-206

Generalize machine learning model serving.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Done
    • Major
    • Resolution: Done
    • None
    • None
    • s2core
    • None

    Description

      One of the top use cases of OLTP graph database is the recommendation(arguably).

      Let's see how item-based collaborative filtering(item-based CF) can be served as graph query.

      1. fetch user's history as the edges of clicked items.
      2. fetch each item's similar items.

      There are few problems with above naive approach since we need to insert many item pairs as edges(N^2 where N is the total number of items).

      Even though bulk load can update a large number of edges in a stable manner, the user needs to generate similarity matrix, which is often very large.

      Also above approach does not generalize other model-based approaches.

      For example, the user wants to use matrix factorization, need to work on following steps.

      1. dump user's history in raw records.
      2. convert user history to the matrix by creating dictionary map between raw value and sequence.
      3. factorize user history, usually using Alternating least squares (ALS) which yields factorized model U, I.
      4. run k nearest neighbor per each item on I, which yield an array of item sequence per each item sequence.
      5. convert item sequence an array of similar item sequence back to an item array of the similar item by using dictionary created from 2.
      6. bulk load item-item similarity as edges.

      Note that these steps become tedious.

      I think above steps can be changed into following if S2Graph support the more generalized way to support serving machine learning model.

      1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be automated.

      To automate 4,5,6, we need to provide ways to load ML models from the remote location and integrate pre-loaded ML model into graph query structure.

      So logically, the original query should be changed into following.

      1. fetch user's history as the edge of clicked items.
      2. convert clicked items into item sequences.
      3. run the k-nearest-neighbor search on pre-loaded ML model and get an array of similar item sequence.
      4. convert an array of similar item sequence into an array of the similar item using pre-loaded ML model's dictionary.

       

      One might argue that supporting machine learning serving is not S2Graph's focus.

      The reason behind this suggestion is that I believe providing a unified interface to traverse not only pre-stored data as vertex/edge, but also model generated data on the fly as vertex/edge can be very useful (not only for collaborative filtering use cases).

       

      Attachments

        Issue Links

          Activity

            People

              steamshon Do Yung Yoon
              steamshon Do Yung Yoon
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 672h
                  672h
                  Remaining:
                  Remaining Estimate - 672h
                  672h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified