Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3771

Unable to write using AvroIO without schema

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Duplicate
    • None
    • Not applicable
    • io-java-avro
    • None

    Description

      I am working on a specific use case where I don't know the schema while writing the GenericRecords' PCollection to File system. Here's how the use case works:

      • My dataflow listens to Pubsub's subscription and gets the message in this format : 
        // {"schema" : <schema_id>, "payload" : "<payload>"}
        
      • It then extracts the id, looks up schema registry and gets the schema for a specific elelemt
      • The payload is then deserialised into GenericRecord
      • PCollection of these records is forwarded to BigQuery writer and it gets written to BigQuery
      • It then is passed to Storage writer that writes to file system using AvroIO

      Now, I am struggling with the last step as AvroIO expects a schema whereas I do not know schema at compile time. All I have is a bunch of elements with schema id embedded.

      Is there any way for AvroIO to write the records to FileSystem without schema? If not, do I have any other alternatives (formats) to write to file system?

      Attachments

        Activity

          People

            chamikara Chamikara Madhusanka Jayalath
            darshanmehta2 Darshan Mehta
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: