Details

    • Type: Wish
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: sdk-ideas
    • Labels:
      None

      Activity

      Hide
      sinisa_lyh Neville Li added a comment - - edited
      Show
      sinisa_lyh Neville Li added a comment - - edited I'm working on porting Scio to Beam. Can this be assigned to me? https://github.com/spotify/scio/tree/apache-beam https://github.com/nevillelyh/incubator-beam/tree/scio
      Hide
      kenn Kenneth Knowles added a comment -

      Here you go! I will go ahead and make the name a little more precise, since right now it is more of a vague wish.

      Show
      kenn Kenneth Knowles added a comment - Here you go! I will go ahead and make the name a little more precise, since right now it is more of a vague wish.
      Hide
      sinisa_lyh Neville Li added a comment -

      I ported 2 modules over so far:

      • scio-core into sdks/scala/core, this is the core Scala DSL
      • scio-test into sdks/scala/core, this includes utilities for writing idiomatic Scala tests and tests for scio-core

      Question is, is sdks/scala the right place or should we move it to another top-level module i.e. dsls/scio?

      Show
      sinisa_lyh Neville Li added a comment - I ported 2 modules over so far: scio-core into sdks/scala/core , this is the core Scala DSL scio-test into sdks/scala/core , this includes utilities for writing idiomatic Scala tests and tests for scio-core Question is, is sdks/scala the right place or should we move it to another top-level module i.e. dsls/scio ?
      Hide
      jbonofre Jean-Baptiste Onofré added a comment -

      Awesome ! Thanks. As discussed by e-mail, I started to test it.

      Show
      jbonofre Jean-Baptiste Onofré added a comment - Awesome ! Thanks. As discussed by e-mail, I started to test it.
      Hide
      jbonofre Jean-Baptiste Onofré added a comment -

      Resuming tests and changes on Scio.

      Show
      jbonofre Jean-Baptiste Onofré added a comment - Resuming tests and changes on Scio.
      Hide
      yuchaoran2011 Chaoran Yu added a comment -

      Looks like scio currently only supports Google Cloud Dataflow as the underlying runner. Now that the project is donated to Beam, are there any plans to support Spark, Flink and other runners?

      Show
      yuchaoran2011 Chaoran Yu added a comment - Looks like scio currently only supports Google Cloud Dataflow as the underlying runner. Now that the project is donated to Beam, are there any plans to support Spark, Flink and other runners?
      Hide
      amitsela Amit Sela added a comment -

      Scio currently supports the Dataflow SDK (sort of Beam predecessor), and once it will support Beam it could interact with any runner supporting the Java SDK since Scio is a Scala DSL running on top of the Java SDK.

      Show
      amitsela Amit Sela added a comment - Scio currently supports the Dataflow SDK (sort of Beam predecessor), and once it will support Beam it could interact with any runner supporting the Java SDK since Scio is a Scala DSL running on top of the Java SDK.
      Hide
      yuchaoran2011 Chaoran Yu added a comment -

      Thanks Amit for the clarification. Any idea for which release version of Beam that scio integration can be finished?

      Show
      yuchaoran2011 Chaoran Yu added a comment - Thanks Amit for the clarification. Any idea for which release version of Beam that scio integration can be finished?
      Hide
      amitsela Amit Sela added a comment -

      Davor Bonaci where are we with Scio integration ?

      Show
      amitsela Amit Sela added a comment - Davor Bonaci where are we with Scio integration ?
      Hide
      jbonofre Jean-Baptiste Onofré added a comment -

      I updated to 0.4.0 release and I will deal with Neville for the merge.

      Show
      jbonofre Jean-Baptiste Onofré added a comment - I updated to 0.4.0 release and I will deal with Neville for the merge.
      Hide
      amitsela Amit Sela added a comment -

      You mean 0.5.0 ?

      Show
      amitsela Amit Sela added a comment - You mean 0.5.0 ?
      Show
      sinisa_lyh Neville Li added a comment - WIP branch here using 0.4.0 https://github.com/spotify/scio/tree/apache-beam Ticket https://github.com/spotify/scio/issues/279
      Hide
      amitsela Amit Sela added a comment -

      Oh, got it, thanks!

      Show
      amitsela Amit Sela added a comment - Oh, got it, thanks!
      Hide
      nehalecky Nicholaus E Halecky added a comment -

      Hi all! Wonderful to see the progress made here so far, and was interested to know the status of this effort?

      Show
      nehalecky Nicholaus E Halecky added a comment - Hi all! Wonderful to see the progress made here so far, and was interested to know the status of this effort?
      Hide
      jbonofre Jean-Baptiste Onofré added a comment -

      Neville Li updated the branch to Beam 0.6.0, so, I think we can discuss about a merge in Apache codebase after a little cleanup. Thought ?

      Show
      jbonofre Jean-Baptiste Onofré added a comment - Neville Li updated the branch to Beam 0.6.0, so, I think we can discuss about a merge in Apache codebase after a little cleanup. Thought ?
      Hide
      sinisa_lyh Neville Li added a comment -

      We prefer to keep it separate for now mainly for logistics reasons:

      • we use SBT with lots of custom logic
      • we release very often, once every 1-2 weeks
      • we monkey patch Beam bugs, test in our production jobs, before upstream Beam release
      • we use a lightweight collaboration model, mainly just Github issues & PRs
      • there're only 3 Scio developers at Spotify supporting 150+ internal users and many external ones, all running on Dataflow

      However I also want to point out that nothing should stop those interested from trying it out or contributing

      • we decoupled Dataflow runner as much as possible
      • Scio should run on other runners without modification, just a matter of changing dependencies and arguments
      • there're still parts coupled with GCP and Dataflow runner but hopefully we can gradually decouple them as the file system and other related API improves
      • it'd be great to see bug reports and PRs from the community
      Show
      sinisa_lyh Neville Li added a comment - We prefer to keep it separate for now mainly for logistics reasons: we use SBT with lots of custom logic we release very often, once every 1-2 weeks we monkey patch Beam bugs, test in our production jobs, before upstream Beam release we use a lightweight collaboration model, mainly just Github issues & PRs there're only 3 Scio developers at Spotify supporting 150+ internal users and many external ones, all running on Dataflow However I also want to point out that nothing should stop those interested from trying it out or contributing we decoupled Dataflow runner as much as possible Scio should run on other runners without modification, just a matter of changing dependencies and arguments there're still parts coupled with GCP and Dataflow runner but hopefully we can gradually decouple them as the file system and other related API improves it'd be great to see bug reports and PRs from the community

        People

        • Assignee:
          Unassigned
          Reporter:
          jbonofre Jean-Baptiste Onofré
        • Votes:
          1 Vote for this issue
          Watchers:
          14 Start watching this issue

          Dates

          • Created:
            Updated:

            Development