Uploaded image for project: 'Comdev GSOC'
  1. Comdev GSOC
  2. GSOC-258

[GSOC][Beam] Build out Beam Yaml features

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. Beam recently added support for launching jobs using Yaml on top of its other SDKs, this project would focus on adding more features and transforms to the Yaml SDK so that it can be the easiest way to define your data pipelines.

      Objectives:
      1. Add support for existing Beam transforms (IOs, Machine Learning transforms, and others) to the Yaml SDK
      2. Add end to end pipeline use cases using the Yaml SDK
      3. (stretch) Add Yaml SDK support to the Beam playground

      Useful links:
      Apache Beam repo - https://github.com/apache/beam
      Yaml SDK code + docs - https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml
      Open issues for the Yaml SDK - https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Ayaml

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            damccorm Danny McCormick

            Dates

              Created:
              Updated:

              Slack

                Issue deployment