What happens with partial updates in that case? Suppose an increment operation is requested which succeeds locally but is not propagated to the sub shard.
If we're talking about failures due to the sub-shard already being active when it receives an update from the old shard who thinks it's still the leader, then I think we're fine. This isn't a new failure mode, but just another way that the old shard can be out of date. For example, once a normal update is received by the new shard, the old shard will be out of date anyway.
If the client retries, the index will have wrong values.
If the client retries to the same old shard that is no longer the leader, then the update will fail again because the sub-shard will reject it again? We could perhaps return an error code suggesting that the client is using stale cluster state (i.e. re-read before trying the update again).