Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8140

Python API: PTransform should be immutable and reusable

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • sdk-py-core
    • None

    Description

      While the Java API seems fine the Python API is (at least) counterintuitive.

      Let's see the following example:

      p1 = beam.Pipeline()
      p2 = beam.Pipeline()
      node = 'ReadTrainData' >> beam.io.ReadFromText("/tmp/aaa.txt")
      p1 | node 
      p2 | node //fails here 

      The code above will fail because the node somehow remembers that it was already attached to p1. In fact, unlike in Java, the | (apply) method is defined on the PTransform.

      If any, only the pipeline object should be mutable here.

      Attachments

        Activity

          People

            Unassigned Unassigned
            chris_suchanek Chris Suchanek
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: