Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-1801

default_job_name can generate names not accepted by DataFlow

Details

    • Bug
    • Status: Resolved
    • P4
    • Resolution: Fixed
    • None
    • 2.0.0
    • sdk-py-core
    • None

    Description

      The default job name generated by:

      https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L288

      is partially derived from the os username of the executing user. These may contain characters not accepted by Dataflow, resulting in errors like:

      apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
      (a1b878f3562c0e6d): Error processing pipeline. Causes: (a1b878f3562c04ae): Prefix for cluster 'beamapp-dennis.docter-032-03231324-1edc-harness' should match '[a-z]([-a-z0-9]

      {0,61}

      [a-z0-9])?'. This probably means the joblabel is invalid.

      To solve this issue, sanitise the username to only container alphanumeric characters and dashes.

      Also there seems to be no length restriction and dataflow imposes a 63 character length limit in the above case. Limiting on length substantially shorter than that to allow for postfixes (like -harness in this case) may be wise.

      Attachments

        Issue Links

          Activity

            People

              pabloem Pablo Estrada
              d23 Dennis Docter
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: