Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Duplicate
-
None
-
None
Description
I am working on a specific use case where I don't know the schema while writing the GenericRecords' PCollection to File system. Here's how the use case works:
- My dataflow listens to Pubsub's subscription and gets the message in this format :
// {"schema" : <schema_id>, "payload" : "<payload>"}
- It then extracts the id, looks up schema registry and gets the schema for a specific elelemt
- The payload is then deserialised into GenericRecord
- PCollection of these records is forwarded to BigQuery writer and it gets written to BigQuery
- It then is passed to Storage writer that writes to file system using AvroIO
Now, I am struggling with the last step as AvroIO expects a schema whereas I do not know schema at compile time. All I have is a bunch of elements with schema id embedded.
Is there any way for AvroIO to write the records to FileSystem without schema? If not, do I have any other alternatives (formats) to write to file system?