Assumption of identical partitioning depends on the engine. Maybe it doesn't hold in case of flink at all?
In this case (checkpoint or not) the assumption is that collection.map(x=>x) doesn't change neither data allocation to splits nor its ordering inside every split (aka partition). If this holds, then input and output are identically partitioned.
Therefore, if B = A.map(x=> x...) then A and B are identically partitioned, and then A + B can be optimized as A.zip(B).map (_._1 + _._2). If A and B are not identically partitioned, then elementwise binary functions would require pre-join, which is much more expensive than zip.
This test simply provokes this optimization (in spark), but if engine doesn't support zips or assumption of identical partitioning does not hold, then engine optimizer should rectify the situation by always executing join() after mapblocks. Check back with me for more info where to hack it if it is indeed the case..