Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-5378

Ensure all Go SDK examples run successfully

Details

    • Bug
    • Status: Triage Needed
    • P3
    • Resolution: Fixed
    • Not applicable
    • 2.35.0
    • sdk-go
    • None

    Description

      I've been spending a day or so running through the example available for the Go SDK in order to see what works and on what runner (direct, dataflow), and what doesn't and here's the results.

      All available examples for the go sdk. For me as a new developer on apache beam and dataflow it would be a tremendous value to have all examples running because many of them have legitimate use-cases behind them.

      ├── complete
      │   └── autocomplete
      │       └── autocomplete.go
      ├── contains
      │   └── contains.go
      ├── cookbook
      │   ├── combine
      │   │   └── combine.go
      │   ├── filter
      │   │   └── filter.go
      │   ├── join
      │   │   └── join.go
      │   ├── max
      │   │   └── max.go
      │   └── tornadoes
      │       └── tornadoes.go
      ├── debugging_wordcount
      │   └── debugging_wordcount.go
      ├── forest
      │   └── forest.go
      ├── grades
      │   └── grades.go
      ├── minimal_wordcount
      │   └── minimal_wordcount.go
      ├── multiout
      │   └── multiout.go
      ├── pingpong
      │   └── pingpong.go
      ├── streaming_wordcap
      │   └── wordcap.go
      ├── windowed_wordcount
      │   └── windowed_wordcount.go
      ├── wordcap
      │   └── wordcap.go
      ├── wordcount
      │   └── wordcount.go
      └── yatzy
          └── yatzy.go
      
      

      All examples that are supposed to be runnable by the direct driver (not depending on gcp platform services) are runnable.

      On the otherhand these are the tests that needs to be updated because its not runnable on the dataflow platform for various reasons.
      I tried to figure them out and all I can do is to pin point at least where it fails since my knowledge so far in the beam / dataflow internals is limited.

      .
      ├── complete
      │   └── autocomplete
      │   └── autocomplete.go

      Runs successfully if swapping the input to one of the shakespear data files from gs://
      But when running this it yields a error from the top.Largest func (discussed in another issue that top.Largest needs to have a serializeable combinator / accumulator)

      ➜ autocomplete git:(master) ✗ ./autocomplete --project fair-app-213019 --runner dataflow --staging_location=gs://fair-app-213019/staging-test2 --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
      2018/09/11 15:35:26 Running autocomplete
      Unable to encode combiner for lifting: failed to encode custom coder: bad underlying type: bad field type: bad element: unencodable type: interface {}2018/09/11 15:35:26 Using running binary as worker binary: './autocomplete'
      2018/09/11 15:35:26 Staging worker binary: ./autocomplete

      ├── contains
      │   └── contains.go

      Fails when running debug.Head for some mysterious reason, might have to do with the param passing into the x,y iterator. Frankly I dont know and could not figure.
      But removing the debug.Head call everything works as expected and succeeds.

      ├── cookbook
      │   ├── combine
      │   │   └── combine.go

      https://github.com/apache/beam/pull/6474

      │   ├── filter
      │   │   └── filter.go

      Fails go-job-1-1536673624017210012
      2018-09-11 (15:47:13) Output i0 for step was not found.

      │   ├── join
      │   │   └── join.go

      Working as expected! Whey!

      │   ├── max
      │   │   └── max.go

      Working!

      │   └── tornadoes
      │   └── tornadoes.go

      Working!

      ├── debugging_wordcount
      │   └── debugging_wordcount.go

      Works fine!

      ├── forest
      │   └── forest.go

      Bazinga, all good!

      ├── grades
      │   └── grades.go

      So great!

      ├── minimal_wordcount
      │   └── minimal_wordcount.go

      Runs only on direct, implemented PR https://github.com/apache/beam/pull/6386

      ├── multiout
      │   └── multiout.go

      Runs like a boss!

      ├── pingpong
      │   └── pingpong.go

      Stating it can't run on dataflow
      // NOTE(herohde) 2/23/2017: Dataflow does not allow cyclic composite structures.

      ├── streaming_wordcap
      │   └── wordcap.go

      Brilliant!

      ├── windowed_wordcount
      │   └── windowed_wordcount.go

      All good!

      ├── wordcap
      │   └── wordcap.go

      Runs fine on direct runner but not on dataflow because of input is local and is Using
      textio.Immediate, hence not able to pass in a gs:// path

      This is a won't fix

      ├── wordcount
      │   └── wordcount.go

      All good!

      └── yatzy
      └── yatzy.go

      All good!

      Attachments

        Activity

          People

            lostluck Robert Burke
            ptomasroos Tomas Roos
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 40m
                4h 40m