Details
-
Bug
-
Status: Resolved
-
P3
-
Resolution: Fixed
-
None
Description
Two different PipelineOptions interfaces defined a 'zone' option: GcpOptions [1] and DataflowWorkerPoolOptions [2]. It's not an error for an option to be redefined, and internally Beam checks that the definitions are compatible.
In this case the two 'zone' definitions are compatible but they have different descriptions. This can be confusing as setting one will also impact the other.
We should make improvements around duplicate PipelineOptions definitions for a given runner. In this case, I propose we:
a) Update the @Description's so that they match.
b) Mark one of them as @Deprecated with a link to the other. Migrate code references and plan to remove it on the next major version.
c) Add a test which checks all PipelineOptions on the DataflowRunner classpath and verify that any duplicates have the properties above (equivalent definitions including @Description, and only one non-@Deprecated version)
[1] https://github.com/apache/beam/blob/670941961845593d9a7e09b17c1bd117f27bf579/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L95
[2] https://github.com/apache/beam/blob/670941961845593d9a7e09b17c1bd117f27bf579/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java#L175
Attachments
Issue Links
- links to