Trym didnt mention it, but this is not only a negligible problem that will never cause any problems in real-world usage. Actually we discovered the problem during one of our performance/endurance test of our real world application in a real world setup and with real world workload (high). We are running with numerous Solr instances in a SolrCloud cluster, with numerous collections each having about 25 slices each with 2 shards (one replica for each slice). During the test Solrs lose their ZK connection (probably due to too long GC pause) and reconnect - resulting in more watchers. The next time a dis-/re-connect to ZK happens it gets many watcher-events resulting in even more watchers for the next time. All in all, seen from the outside, this breaks our performance/endurance test - at first things starts to slow down and eventually JVMs break down with OOM errors. This is a self-reinforcing problem, because for every iteration more time has to be used by the garbage collector collecting watchers (twice as many as last time), increasing the probability of new ZK timeouts, and more time has to be used creating new watchers (twice as many as last time).
I think you should commit the fix. Basically because it makes a (our) real world application able to run for a long time - it wasnt before. Commit the fix, not so much for our sake, because we are using our own build of Solr (inkl this fix, other fixes and nice impl of optimistic locking etc (SOLR-3173, SOLR-3178, etc)) anyway, but to save others (that might also be among the "first movers" on using Solr 4.0 for high scale real world applications) from having to use weeks tracking down the essence of this issue and make a fix.
If you think this observation/fix should lead to a walk through of the code, to check if watchers are used undesirably at other places, and maybe even come to a more generic fix, I would endorse such a task. But for now I urge you to commit to save others from weeks of debugging. If/when you come to a better or more generic solution, you can always go refactor.
Regards, Per Steffensen