Uploaded image for project: 'S2Graph'
  1. S2Graph
  2. S2GRAPH-226

Provide example spark jobs to explain how to utilize WAL log.

    Details

    • Type: New Feature
    • Status: Done
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: s2core, s2jobs
    • Labels:
      None

      Description

      Even though s2graph publish all incoming vertex/edge into Kafka, there is no example showing how to use this WAL log.

      I suggest adding a simple example showing how to process WAL and let me explain what use cases this example can benefit.

      At kakao, s2graph have been used as the fact storage, which store all user's activities such as click content, buy a product, search query.

      [{
      	"timestamp": 1,
      	"elem": "e",
      	"from": "steamshon",
      	"to": "s2graph",
      	"label": "search_query",
      	"props": {}
      }, {
      	"timestamp": 10,
      	"elem": "e",
      	"from": "steamshon",
      	"to": "github.com/apache/incubator-s2graph",
      	"label": "content_click",
      	"props": {}
      }, {
      	"timestamp": 12,
      	"elem": "v",
      	"id": "steamshon",
      	"serviceName": "s2graph",
      	"columnName": "user",
      	"props": {
      		"gender": "M"
      	}
      }]
      

      Each activity, label in s2graph words, consisting of their own graph, but when they are all connected together, then it gives much more information.

      Above edges can be aggregated as Vertex.

      It is up to users how to connect each graph, but in our case, we used `user` to merge multiple graphs. for example, we made each activity such as click content, buy a product, search query all use the same `userId` for the same `user`.

      Below is simple example data.

      {
      	"timestamp": 10,
      	"elem": "v",
      	"id": "steamshon",
      	"serviceName": "s2graph",
      	"columnName": "user",
      	"props": {
      		"gender": "M",
      		"edges": [{
      			"timestamp": 1,
      			"to": "s2graph",
      			"label": "search_query",
      			"props": {}
      		}, {
      			"timestamp": 10,
      			"to": "github.com/apache/incubator-s2graph",
      			"label": "content_click",
      			"props": {}
      		}]
      	}
      }
      

      This connected graph can be used not only for OLTP but also OLAP.

      I believe s2graph WAL log is good way to integrate OLTP and OLAP, and adding this example can help for user to understand how to leverage it.

      desing doc(work in progress)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                steamshon DOYUNG YOON
                Reporter:
                steamshon DOYUNG YOON
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 336h
                  336h
                  Remaining:
                  Remaining Estimate - 336h
                  336h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified