Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.6.3, 1.6.4, 1.7.2, 1.8.0
Description
As the mail list said[1], there may be a problem when more than one operator chained in a single task, and all the operators have states, we'll encounter data loss silently problem.
Currently, the local directory we used is like below
../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
if more than one operator chained in a single task, and all the operators have states, then all the operators will share the same local directory(because the vertext_id is the same), this will lead a data loss problem.
The path generation logic is below:
// LocalRecoveryDirectoryProviderImpl.java @Override public File subtaskSpecificCheckpointDirectory(long checkpointId) { return new File(subtaskBaseDirectory(checkpointId), checkpointDirString(checkpointId)); } @VisibleForTesting String subtaskDirString() { return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + subtaskIndex).toString(); } @VisibleForTesting String checkpointDirString(long checkpointId) { return "chk_" + checkpointId; }
Attachments
Issue Links
- links to