The 1st paragraph assumes parallel is faster, is that true? An alternative solution is that a DNs reports to the standby after it has reported to the primary. This should allow the cluster to start up more quickly. The downside here of course is that you can't fail over to the standby until after the 2nd report has finished, but that's probably not a big deal. A disadvantage of this approach is that it lets the standby get out of sync, but we have to handle an unsychronized standy anyway (eg if the standby is brought up after the primary or restarted while the primary is running).
The primary could also forward block BRs to the standby but I agree that we shouldn't pursue this approach as the implementation will be more complex and it unnecesarily restricts the potential parallelism (though I'm not sure it is actually slower, you could potentially transmit much less information over the network if you report from the primary to the standby). It also makes supporting multiple standbys more dificult.
I like solution #1. Aside from the simplicity, I think preventing a scan of all the DN disks is important otherwise restarting the standby in a busy cluster will impact DN performance. You could also easily implement the above optimization of delaying the BR to the standby. 100M blocks seems low, eg a cluster with 4K hosts, 12 by 3TB drives/host and 256MB blocks is ~580M total blocks. However that's still < 10MB/host so I think it's OK.