I think the right way to do this is to have a sequence in zookeeper that atomically increments and use this for id generation. On startup a node that has no id can generate one for itself and store it.
One tricky bit is that this node id needs to be stored with the data, but we actually partition data up over multiple disks now and hope to be able to survive the destruction of any of them. Which disk should we store the node id on? I would recommend we store it on all of them--if it is missing on some we will add it there, if the id is inconsistent between disks we will error out (this should never happen).
I would recommend adding a properties file named "meta" in every data directory containing the "id=x" value, we can extend this later with more perminant values. For example, I think it would be nice to add a data format version to help with in-place data upgrades.
On startup the broker would check this value for consistency across directories. If it is not present in any directory it would auto-generate a node id and persist that for future use.
For compatibility we would retain the current id configuration value--if it is present we will use it and ensure the id sequence is larger than this value.