Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-6407

regression: FileIO.writeDynamic() with side inputs fails in DirectRunner

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Blocker
    • Resolution: Unresolved
    • Affects Version/s: 2.9.0
    • Fix Version/s: 2.10.0
    • Component/s: runner-direct
    • Labels:
    • Flags:
      Important

      Description

      When FileIO.writeDynamic is used with automatic sharding and  a Contextful.Fn that uses side inputs for the file naming, DirectRunner (and TestPipeline) fail with: 

      java.lang.IllegalStateException: All PCollectionViews that are consumed must be written by some WriteView PTransform: Missing [<unnamed> [RunnerPCollectionView]]

       

      Example code:  

      PCollectionView<String> outputFileName =
         pipeline.apply(
            "outputDir",
             Create.of("/tmp/testout")).apply(View.asSingleton());
      
      Contextful.Fn<String, FileIO.Write.FileNaming> manifestNaming =
         (element, c) ->
            (window, pane, numShards, shardIndex, compression) -> 
               c.sideInput(outputFileName)+shardIndex;
      
      pipeline.apply(FileIO.<String, String>writeDynamic()
         .by(SerializableFunctions.constant(""))
         .withDestinationCoder(StringUtf8Coder.of())
         .via(TextIO.sink())
         .withTempDirectory("/tmp")
         .withNaming(Contextful.of(
            manifestNaming,
            Requirements.requiresSideInputs(outputFileName))));
      

       

      This does not occur in Dataflow-runner

      It does not occur if the ContextFul.Fn is not given side inputs.

      It does not occur if withNumShards(1) is set.

      It did not occur in 2.8.0, and does in 2.9.0 and 2.10.0-SNAPSHOT (as of today)

       

      The cause appears to be due to the DirectRunner using TransformOverrides re-writing FileIO sinks to use runner-determined-sharding

      ( see DirectRunner.java line 226 )

       but I do not know why this started occuring in 2.9.0...

        Attachments

        1. beam-filewriter-demo.tgz
          2 kB
          Niel Markwick

          Issue Links

            Activity

              People

              • Assignee:
                nielm Niel Markwick
                Reporter:
                nielm Niel Markwick
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h