The problem is in the OCP.migrateKey method:
log.info("Requesting merge of temp source collection replica to target leader");
params = new ModifiableSolrParams();
setupAsyncRequest(asyncId, requestMap, params, sourceLeader.getNodeName());
sendShardRequest(targetLeader.getNodeName(), params, shardHandler);
"MIGRATE failed to merge " + tempCollectionReplica2 +
" to " + targetLeader.getStr("core") + " on node: " + targetLeader.getNodeName(),
completeAsyncRequest(asyncId, requestMap, results);
Notice that the setupAsyncRequest is being called with sourceLeader.getNodeName() but the actual request is being sent to the targetLeader.getNodeName(). So fixing this part is easy enough.
I tried to see why our existing AsyncMigrateRouteKey test doesn't tickle this problem and I was surprised that the test asks for the wrong node but always gets the right status. Then I realized that it is because all the nodes in our tests are loaded by the same classloader and since the core admin keeps the requests in a static map, any node can give the status of an async core admin API call. The request map in CoreAdminHandler doesn't need to be static. Once I changed the request map to be an instance variable, this problem is reproduced easily by the existing test.
We should refactor the code in OCP such that these situations become impossible. I'll put up a patch.
I'll also create an issue to enforce a different class loader for each jetty.