Description
Creating a pipeline that writes out data to a TextFile (mapside) and then Avro (reduce side), causes the text side write and any processing that might happen on that branch to not show up in the the plan.
Specifically the name of the pipeline is..
Text(/simple.txt)S0[[S1+Text(/some/test/first)]/[S3]]+GBK+ungroup+PTables.values+Avro(/some/test/path)"
However the generated DOT is:
digraph G {
"Text(/simple.txt)" [label="Text(/simple.txt)" shape=folder];
"Avro(/some/test/path)" [label="Avro(/some/test/path)" shape=folder];
subgraph "cluster-job1" {
subgraph "cluster-job1-map"
subgraph "cluster-job1-reduce"
{ label = Reduce; color = red; "GBK@221482301@1822883541" [label="GBK" shape=box]; "PTables.values@1156570456@1822883541" [label="PTables.values" shape=box]; "ungroup@1830236047@1822883541" [label="ungroup" shape=box]; } }
"ungroup@1830236047@1822883541" -> "PTables.values@1156570456@1822883541";
"GBK@221482301@1822883541" -> "ungroup@1830236047@1822883541";
"PTables.values@1156570456@1822883541" -> "Avro(/some/test/path)";
"Text(/simple.txt)" -> "S0@875319338@1822883541";
"S3@2118275672@1822883541" -> "GBK@221482301@1822883541";
"S0@875319338@1822883541" -> "S3@2118275672@1822883541";
}
Which is missing "S1" and the writing to '/some/test/first'