Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-17

Add support for new Beam Source API

Details

    • Improvement
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 0.4.0
    • runner-spark
    • None

    Description

      The API is discussed in https://cloud.google.com/dataflow/model/sources-and-sinks#creating-sources

      To implement this, we need to add support for com.google.cloud.dataflow.sdk.io.Read in TransformTranslator. This can be done by creating a new SourceInputFormat class that translates from a DF Source to a Hadoop InputFormat. The two concepts are pretty-well aligned since they both have the concept of splits and readers.

      Note that when there's a native HadoopSource in DF, it will need special-casing in the code for Read since we'll be able to use the underlying InputFormat directly.

      This could be tested using XmlSource from the SDK.

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              amitsela Amit Sela
              amitsela Amit Sela
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: