The _STD::messages<T>::do_get() method might actually call __rw_manage_cat_data() up to three times to access the cache. Since each access involves a mutex lock, there are going to be some wasted cycles. It would be nice to reduce this to one access. Perhaps the _RW::__rw_get_message() function could be changed to fill in a pointer to the _STD::locale that is kept in the cache and the _RW::__rw_get_locale() function could be removed.
For backward binary compatibility we would need to keep the existing functions around, but we could add an overload and then deprecate the old ones using the _RWSTD_VER macro.