[GSOC-259] [GSOC][Beam] Build out Beam Use Cases - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Labels:
- Beam
- gsoc
- gsoc2024
- mentor

Description

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. On top of providing lower level primitives, Beam has also introduced several higher level transforms used for machine learning and some general data processing use cases. This project focuses on identifying and implementing real world use cases that use these transforms

Objectives:
1. Add real world use cases demonstrating Beam's MLTransform for preprocessing data and generating embeddings
2. Add real world use cases demonstrating Beam's Enrichment transform for enriching existing data with data from a slowly changing source.
3. (Stretch) Implement 1 or more additional "enrichment handlers" for interacting with currently unsupported sources

Useful links:
Apache Beam repo - https://github.com/apache/beam
MLTransform docs - https://beam.apache.org/documentation/transforms/python/elementwise/mltransform/
Enrichment code - https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/enrichment.py
Enrichment docs (should be published soon) - https://github.com/apache/beam/pull/30187

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Danny McCormick

Votes:: 2 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Feb/24 21:17

Updated:: 05/Apr/24 12:39