Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12955

Add support for inferring Beam Schemas from Python protobuf types

Details

    • Improvement
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • sdk-py-core
    • None

    Description

      Just as we can infer a Beam Schema from a NamedTuple type (code), we should have support for inferring a schema from a protobuf-generated Python type.

      This should integrate well with the rest of the schema infrastructure. For example it should be possible to use schema-aware transforms like SqlTransform, Select, or beam.dataframe.convert.to_dataframe on a PCollection that is annotated with a protobuf type. For example (using the addressbook_pb2 example from the tutorial):

      import adressbook_pb2
      
      import apache_beam as beam
      from apache_beam.dataframe.convert import to_dataframe
      
      pc = (input_pc | beam.Map(create_person).with_output_type(addressbook_pb2.Person))
      
      df = to_dataframe(pc) # deferred dataframe with fields id, name, email, ...
      
      # OR
      
      pc | beam.transforms.SqlTransform("SELECT name WHERE email = 'foo@bar.com' FROM PCOLLECTION")
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bhulette Brian Hulette
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: