Details
-
Bug
-
Status: Open
-
P2
-
Resolution: Unresolved
-
2.37.0
-
None
Description
As a Staff Software Engineer at Forgerock developing Beam applications,
I want Direct Runner to work the same as Google Dataflow with Flex Templates,
So that Beam applications can be developed and tested locally with a good developer experience.
This is a high-level report that may need to be broken down into more specific reports.
var stream = pipeline // Extract - Transform - Load, Lather Rinse Repeat //.apply("Extract from Source", PubsubIO.readStrings().fromSubscription(inputSubscription).withTimestampAttribute("date")) .apply("Extract from Source", PubsubIO.readStrings().fromSubscription(inputSubscription)) .apply("Transform to TableRow", ParDo.of(new TransformToTableRow())) .apply("Define Window", Window.<TableRow>into(FixedWindows.of(Duration.standardMinutes(1))).withAllowedLateness(Duration.standardMinutes(10)).accumulatingFiredPanes()) .apply("Transform to JSON", MapElements.into(TypeDescriptors.strings()).via(GenericJson::toString)); stream.apply("Load to PubSub", PubsubIO.writeStrings().to(sinkTopic)); stream.apply("Load to GCS", TextIO.write().to(sinkUri).withWindowedWrites().withNumShards(1));
This code works as expected running under Google Dataflow as a Flex Template.
When running locally under Direct Runner there are two problems
- .withTimestampAttribute("date") fails to find the "date" attribute
- Exception in thread "main" java.lang.IllegalArgumentException: PubSub message is missing a value for timestamp attribute date
- No results are written to GCS
- However, results are written to the local file system as temporary files
- Results are written to PubSub correctly
More context can be provided on request...