Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-6719

Allow multiple Joins in the same pipeline

Details

    Description

      Currently it is not possible to have multiple joins in the same pipeline without wrapping them in individual PTransforms as this would generate name clashes.

      Consider the following test case:

      @Test
      public void testMultipleJoinsInSamePipeline() { 
        leftListOfKv.add(KV.of("Key2", 4L)); 
        PCollection<KV<String, Long>> leftCollection = p.apply("CreateLeft", Create.of(leftListOfKv));
        rightListOfKv.add(KV.of("Key2", "bar")); 
        PCollection<KV<String, String>> rightCollection = p.apply("CreateRight", Create.of(rightListOfKv));
        expectedResult.add(KV.of("Key2", KV.of(4L, "bar")));
        PCollection<KV<String, KV<Long, String>>> output1 = Join.innerJoin(leftCollection, rightCollection);
        PCollection<KV<String, KV<Long, String>>> output2 = Join.innerJoin(leftCollection, rightCollection);
       PAssert.that(output1).containsInAnyOrder(expectedResult);
       PAssert.that(output2).containsInAnyOrder(expectedResult);
       p.run(); 
      }
      

      This fails because of clashing names in the pipeline and there is currently no way to use the join library to give the joins different names.

      Therefore I find myself routinely wrapping joins in new PTransforms which leads me to believe that this should be part of the library itself.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              DanielMe Daniel Mescheder
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h