Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
0.9.0
-
None
-
None
Description
The DAG of RDD can help user understand the data flow and how spark get the final RDD executed. It could help user to find chances to optimize the execution of some complex RDD. I will leverage graphviz to visualize the DAG.
For this task, I plan to split it into 2 steps.
Step 1. Just visualize the simple DAG graph. Each RDD is one node, and there will be one edge between the parent RDD and child RDD. ( I attach one simple graph in the attachments )
Step 2. Put RDD in the same stage into one sub graph. This may need to extract the splitting staging related code in DAGSchduler.