Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1073

Design top-level fluent APIs operators that are capable to be deployed in multi-stage jobs

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      It would be nice to allow users to stay at logic level when using fluent API's operators, w/o concerning about physical partitions of the stream and potential grouping of operators into multiple / single Samza jobs (SAMZA-1041).

      Hence, the fluent API needs to be able to express the physical topics as boundaries between stages in the single logic DAG.

      Besides, users should be able to use fluent API to describe a logic expression at top level, not within a job or within a task.

        Issue Links

          Activity

          Hide
          nickpan47 Yi Pan (Data Infrastructure) added a comment -

          Added some early discussion materials for this top-level fluent API. So far, the main points in the design doc are:

          1. introduce MessageStreamGraph as the representation of the operator DAG
          2. kept the MessageStream as programming API class to allow programmers to build DAG
          3. introduce MessageStreamApplication class as abstract template that user will implement initGraph() to define the DAG
          4. introduce ExecutionEnvironment to carry out the execution of the MessageStreamGraph (i.e. separate the physical deployment from the logic description of DAG)

          Some user code examples are provided here

          As for the scope of this JIRA, we will pursue stage-1 mentioned in SAMZA-1041, i.e. single job for the whole operator DAG. Multi-stage physical jobs should be the responsibility of the ExecutionEnvironment and not included in the scope of this ticket.

          Show
          nickpan47 Yi Pan (Data Infrastructure) added a comment - Added some early discussion materials for this top-level fluent API. So far, the main points in the design doc are: introduce MessageStreamGraph as the representation of the operator DAG kept the MessageStream as programming API class to allow programmers to build DAG introduce MessageStreamApplication class as abstract template that user will implement initGraph() to define the DAG introduce ExecutionEnvironment to carry out the execution of the MessageStreamGraph (i.e. separate the physical deployment from the logic description of DAG) Some user code examples are provided here As for the scope of this JIRA, we will pursue stage-1 mentioned in SAMZA-1041 , i.e. single job for the whole operator DAG. Multi-stage physical jobs should be the responsibility of the ExecutionEnvironment and not included in the scope of this ticket.
          Hide
          nickpan47 Yi Pan (Data Infrastructure) added a comment - - edited

          Discussed w/ Jake Maes and Xinyu Liu, since the fluent api is relevant to multiple projects (i.e. SAMZA-1041, SAMZA-1063), we will start a branch samza-fluent-api-v1 to share the code for development. The following are the tentative order of commits to this shared branch:

          1. Start open source branch samza-fluent-api-v1
          2. merge the top-level APIs and examples on top of Jagadish's window API commit
          3. move the classes from samza-operator to samza-core and remove the samza-operator module
          4. Jacob's stream spec patch (this can be parallel w/ 1 and 2)
          5. Xinyu's execution environment patch
          6. Boris and Navina's standalone branch
          Show
          nickpan47 Yi Pan (Data Infrastructure) added a comment - - edited Discussed w/ Jake Maes and Xinyu Liu , since the fluent api is relevant to multiple projects (i.e. SAMZA-1041 , SAMZA-1063 ), we will start a branch samza-fluent-api-v1 to share the code for development. The following are the tentative order of commits to this shared branch: Start open source branch samza-fluent-api-v1 merge the top-level APIs and examples on top of Jagadish's window API commit move the classes from samza-operator to samza-core and remove the samza-operator module Jacob's stream spec patch (this can be parallel w/ 1 and 2) Xinyu's execution environment patch Boris and Navina's standalone branch
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user nickpan47 opened a pull request:

          https://github.com/apache/samza/pull/51

          SAMZA-1073: top-level fluent API

          `Initial draft of top-level fluent API for operator DAGs

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/samza samza-fluent-api-v1

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/samza/pull/51.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #51


          commit 373048aa0a68221af5f6b5589bbe161c972b11a9
          Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com>
          Date: 2017-02-09T09:56:10Z

          SAMZA-1073: top-level fluent API
          `


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user nickpan47 opened a pull request: https://github.com/apache/samza/pull/51 SAMZA-1073 : top-level fluent API `Initial draft of top-level fluent API for operator DAGs You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/samza samza-fluent-api-v1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/51.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #51 commit 373048aa0a68221af5f6b5589bbe161c972b11a9 Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com> Date: 2017-02-09T09:56:10Z SAMZA-1073 : top-level fluent API `
          Hide
          nickpan47 Yi Pan (Data Infrastructure) added a comment -

          pull request is uploaded here: https://github.com/apache/samza/pull/51

          Show
          nickpan47 Yi Pan (Data Infrastructure) added a comment - pull request is uploaded here: https://github.com/apache/samza/pull/51
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/samza/pull/51

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/samza/pull/51
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user nickpan47 opened a pull request:

          https://github.com/apache/samza/pull/55

          SAMZA-1073: Remove operator module. Move all classes into samza-core

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/nickpan47/samza remove-operator-module

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/samza/pull/55.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #55



          Show
          githubbot ASF GitHub Bot added a comment - GitHub user nickpan47 opened a pull request: https://github.com/apache/samza/pull/55 SAMZA-1073 : Remove operator module. Move all classes into samza-core You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickpan47/samza remove-operator-module Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/55.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #55
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user nickpan47 closed the pull request at:

          https://github.com/apache/samza/pull/55

          Show
          githubbot ASF GitHub Bot added a comment - Github user nickpan47 closed the pull request at: https://github.com/apache/samza/pull/55
          Hide
          nickpan47 Yi Pan (Data Infrastructure) added a comment -

          Code submitted and merged into master. Closing. Thanks!

          Show
          nickpan47 Yi Pan (Data Infrastructure) added a comment - Code submitted and merged into master. Closing. Thanks!

            People

            • Assignee:
              nickpan47 Yi Pan (Data Infrastructure)
              Reporter:
              nickpan47 Yi Pan (Data Infrastructure)
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development