Details
-
Bug
-
Status: Open
-
P2
-
Resolution: Unresolved
-
None
-
None
Description
Unfortunately I haven't been able to diagnose the exact issue here or come up with a minimal repro. I just have some code to reproduce in https://github.com/apache/beam/pull/16445.
That PR adds support for value_count(bins) in the DataFrame API, which for some reason is interacting poorly with pipeline pruning in interactive Beam (rehydrating the pipeline raises an error about a PCollection's producer missing). The PR also adds a test to transform_test.py that replicate the issue, as well as a temporary mitigation in pipeline_fragment.py. I think the mitigation is effectively disabling pipeline pruning, so it likely shouldn't be merged.
Attachments
Issue Links
- is related to
-
BEAM-13625 pipeline_fragment incorrectly prunes producer transform
- Resolved