Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Just as we can infer a Beam Schema from a NamedTuple type (code), we should have support for inferring a schema from a protobuf-generated Python type.
This should integrate well with the rest of the schema infrastructure. For example it should be possible to use schema-aware transforms like SqlTransform, Select, or beam.dataframe.convert.to_dataframe on a PCollection that is annotated with a protobuf type. For example (using the addressbook_pb2 example from the tutorial):
import adressbook_pb2 import apache_beam as beam from apache_beam.dataframe.convert import to_dataframe pc = (input_pc | beam.Map(create_person).with_output_type(addressbook_pb2.Person)) df = to_dataframe(pc) # deferred dataframe with fields id, name, email, ... # OR pc | beam.transforms.SqlTransform("SELECT name WHERE email = 'foo@bar.com' FROM PCOLLECTION")
Attachments
Issue Links
- relates to
-
BEAM-8732 Add support for mapping additional structured types to Python Schemas
- Open
- split to
-
BEAM-13150 Integrate TFRecord/tf.train.Example with Beam Schemas and the DataFrame API
- Open