FYI, there is a nice Unicode web tool here: http://rishida.net/scripts/uniview/
Java identifiers exclude dash (minus) and dot ( - and . ); they allow $, € and other currencies.
An XML NMTOKEN excludes currency symbols, but allows dash, dot, middle dot, underscore, and colon. It also allows Arabic numerals [0-9] at the beginning.
Colons must be excluded for Solr purposes. But I wouldn't exclude dash and dot.
Fields are entered in XML (schema.xml), so why not base the type on an XML type? Validation would be easy:
<!ELEMENT solr-test EMPTY >
<!ATTLIST solr-test field NMTOKEN #REQUIRED>
<solr-test field=" 123.Käse-A_Z "/>
Note the leading and trailing spaces around the attribute value; the XML parser strips these when validating using an NMTOKEN type, so this user error can be excluded fairly simple. The absence of any colon, however, would have to be guaranteed by some other means. Still, I think there are advantages.
If ensuring the uniqueness of a field name in a schema.xml matters, one could also consider using the NAME type and defining field/@name as ID in the DTD. This would exclude dash, dot, middle dot and Arabic numerals as start characters.
I think I could supply a patch for NMTOKEN or NAME if this is found desirable.