Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
Description
Currently it is not possible to have multiple joins in the same pipeline without wrapping them in individual PTransforms as this would generate name clashes.
Consider the following test case:
@Test public void testMultipleJoinsInSamePipeline() { leftListOfKv.add(KV.of("Key2", 4L)); PCollection<KV<String, Long>> leftCollection = p.apply("CreateLeft", Create.of(leftListOfKv)); rightListOfKv.add(KV.of("Key2", "bar")); PCollection<KV<String, String>> rightCollection = p.apply("CreateRight", Create.of(rightListOfKv)); expectedResult.add(KV.of("Key2", KV.of(4L, "bar"))); PCollection<KV<String, KV<Long, String>>> output1 = Join.innerJoin(leftCollection, rightCollection); PCollection<KV<String, KV<Long, String>>> output2 = Join.innerJoin(leftCollection, rightCollection); PAssert.that(output1).containsInAnyOrder(expectedResult); PAssert.that(output2).containsInAnyOrder(expectedResult); p.run(); }
This fails because of clashing names in the pipeline and there is currently no way to use the join library to give the joins different names.
Therefore I find myself routinely wrapping joins in new PTransforms which leads me to believe that this should be part of the library itself.
Attachments
Issue Links
- links to