Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11378

Cannot run Python PortableRunner on EMR cluster

Details

    • Bug
    • Status: Triage Needed
    • P3
    • Resolution: Unresolved
    • None
    • None
    • runner-spark
    • None

    Description

      I have been trying to run the python word-count example on an AWS EMR cluster. And it does not work.

      Things I have tried:

      • Running with 
        python3 py_codes/word_count_beam.py --output word_count_output --runner=SparkRunner
        

        This results in implicitly running with --spark-master-url local[4] which defeats the purpose of running it in a cluster

      • Tried
        python3 py_codes/word_count_beam.py --output word_count_output --runner=SparkRunner --spark-master-url=yarn
        

        Still uses local master.

      So, no way to run a python beam code in a yarn spark cluster?
      This also means no way to run TFX code (which uses beam) in a yarn cluster.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ratulray Ratul Ray
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: