A single call to _STD::messages<T>::do_get() can call __rw_manage_cat_data() up to three times. Since each call involves a mutex lock/unlock, there are going to be some wasted cycles. It would be nice to reduce this to one call. Perhaps the _RW::__rw_get_message() function could be changed to fill in a pointer to the _STD::__rw_locale that is kept in the cache and the _RW::__rw_get_locale() function could be removed.
For binary compatibility reasons, we may need to add an overload of _RW::__rw_get_message and deprecate the other functions so that they are removed automagically in the next major release.