Uploaded image for project: 'Comdev GSOC'
  1. Comdev GSOC
  2. GSOC-258

[GSOC][Beam] Build out Beam Yaml features

    XMLWordPrintableJSON

Details

    Description

      Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. Beam recently added support for launching jobs using Yaml on top of its other SDKs, this project would focus on adding more features and transforms to the Yaml SDK so that it can be the easiest way to define your data pipelines.

      Objectives:
      1. Add support for existing Beam transforms (IOs, Machine Learning transforms, and others) to the Yaml SDK
      2. Add end to end pipeline use cases using the Yaml SDK
      3. (stretch) Add Yaml SDK support to the Beam playground

      Useful links:
      Apache Beam repo - https://github.com/apache/beam
      Yaml SDK code + docs - https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml
      Open issues for the Yaml SDK - https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Ayaml

      Attachments

        Activity

          People

            Unassigned Unassigned
            damccorm Danny McCormick
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: