Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
None
-
None
-
None
-
None
Description
Repro:
- Enable tez.shuffle-vertex-manager.enable.auto-parallel.
- kill the Tez AM container after the job has reached to the point that VM has reconfigured the Edge.
- The new Tez AM attempt will fail to the following error.
org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source should exist at org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexStarted(ShuffleVertexManager.java:497) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:589) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:658) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:653) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
That is because the edge routing type changed to DataMovementType.CUSTOM after reconfiguration. Allowing DataMovementType.CUSTOM in the following check seems to fix the issue.
if (entry.getValue().getDataMovementType() == DataMovementType.SCATTER_GATHER) { bipartiteSources++; }