[GSOC-258] [GSOC][Beam] Build out Beam Yaml features - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Labels:
- Beam
- gsoc
- gsoc2024
- mentor

Description

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. Beam recently added support for launching jobs using Yaml on top of its other SDKs, this project would focus on adding more features and transforms to the Yaml SDK so that it can be the easiest way to define your data pipelines.

Objectives:
1. Add support for existing Beam transforms (IOs, Machine Learning transforms, and others) to the Yaml SDK
2. Add end to end pipeline use cases using the Yaml SDK
3. (stretch) Add Yaml SDK support to the Beam playground

Useful links:
Apache Beam repo - https://github.com/apache/beam
Yaml SDK code + docs - https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml
Open issues for the Yaml SDK - https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Ayaml

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Danny McCormick

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Feb/24 21:15

Updated:: 05/Apr/24 12:40