Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10409

Add combiner packing to graph optimizer phases

Details

    • Improvement
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • Missing
    • runner-core

    Description

      Some use cases of Beam (e.g. TensorFlow Transform) create thousands of Combine stages with a common parent. The large number of stages can cause performance issues on some runners. To alleviate, a graph optimization phase could be added to the translations module that packs compatible Combine stages into a single stage.

      The graph optimization for CombinePerKey would work as follows: If CombinePerKey stages have a common input, one input each, and one output each, pack the stages into a single stage that runs all CombinePerKeys and outputs resulting tuples to a new PCollection. A subsequent stage unpacks tuples from this PCollection and sends them to the original output PCollections.

      There is an additional issue with supporting this for CombineGlobally: because of the intermediate KeyWithVoid stage between the CombinePerKey stages and the input stage, the CombinePerKey stages do not have a common input stage, and cannot be packed. To support CombineGlobally, a common sibling elimination graph optimization phase can be used to combine the KeyWithVoid stages. After this, the CombinePerKey stages would have a common input and can be packed.

      Attachments

        Activity

          People

            myffical@gmail.com Yifan Mai
            myffical@gmail.com Yifan Mai
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 13h
                13h