Please bare with me. Initially this thought sounds stupid but it has its merits.
How do you build a distributed producer at the moment? You use Kafka Connect which in turn requires a cluster that tells which instance is producing what partitions.
On the consumer side it is different. There Kafka itself does the data distribution. If you have 10 Kafka partitions and 10 consumers, each will get data for one partition. With 5 consumers, each will get data from two partitions. And if there is only a single consumer active, it gets all data. All is managed by Kafka, all you have to do is start as many consumers as you want.
I'd like to suggest something similar for the producers. A producer would tell Kafka that its source has 10 partitions. The Kafka server then responds with a list of partitions this instance shall be responsible for. If it is the only producer, the response would be all 10 partitions. If it is the second instance starting up, the first instance would get the information it should produce data for partition 1-5 and the new one for partition 6-10. If the producer fails to respond with an alive packet, a rebalance does happen, informing the active producer to take more load and the dead producer will get an error when sending data again.
For restart, the producer rebalance has to send the starting point where to start producing the data onwards from as well, of course. Would be best if this is a user generated pointer and not the topic offset. Then it can be e.g. the database system change number, a database transaction id or something similar.