Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
2.31.0, 2.32.0, 2.33.0
-
None
Description
If you use the to_csv of the DeferredDataFrame twice in a single pipeline like this :
df1 = pd.DataFrame.from_records({"a":"b"}, index=[0]) df2 = pd.DataFrame.from_records({"a":"b"}, index=[0]) with beam.Pipeline() as p: df1 = to_dataframe(to_pcollection(df1, pipeline=p), label="df1") df2 = to_dataframe(to_pcollection(df2, pipeline=p), label="df2") df1.to_csv("test.csv") df2.to_csv("test2.csv")
You get this error on the second to_csv call
RuntimeError: A transform with label "ToPCollection(df)" already exists in the pipeline. To apply a transform with a specified label write pvalue | "label" >> transform
I think it comes from the fact that to_csv is calling a to_pcollection without any label, causing to infer an identical label for both to_csv function calls.
Attachments
Issue Links
- links to