Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9451

Optimize translation when Schema information is available in Spark Structured Streaming runner

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: runner-spark

      Description

      Spark Structured Streaming runner supports Datasets that already have Schema information. This is used by Spark to optimize jobs (via Catalyst). This issue is to implement optimized translations of the transforms for the runner so we can benefit of the performance improvements internally done by Spark.

      Notice that we also may need to map Beam's core internal representations like WindowedValue so we can have intermediary optimizations.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              iemejia Ismaël Mejía
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: