I think it should be safe to use a WeakHashMap for the hardRefs instead of HashMap?
This way, if a thread has finished and its Thread object is otherwise GCable, the entries in hardRefs should be cleared... though, it's not clear to me precisely when they will be cleared. If it's only on future access to the WeakHashMap (get or set), which seems likely because I think WeakHashMap uses a WeakReference for the keys and therefore won't really remove an entry util it's later "touched", then again only on set will the object be cleared and we haven't really improved the situation.
Matthew, did you try that change, and, did it improve the scenario above?
Failing that, I think we have to purge it get... maybe we can amortize it (every Nth get, where N is a factor of how many entries are in the map...).
Also: I don't think PagedBytes should use CloseableThreadLocal... I think it should just new byte.
Separately: maybe SnowballAnalyzer is too heavy...? Does it have some static data that ought to be loaded once and shared across analyzers... but isn't today?