In looking at the committed diffs (when JIRA was down last night and earlier today, the lucene_solr_4_7 commit didn't put a comment on this issue, which sucks), I see that I didn't fully patch StandardTokenizerImpl.jflex, although I did correctly patch UAX29URLEmailTokenizerImpl, which is basically a superset of StandardTokenizerImpl.jflex.
I've added some more tests to show the problem (existing tests didn't fail), patch forthcoming. Here's an example that should be split by StandardTokenizer but isn't currently - the issue is triggered via a preceding char matching Word_Break = ExtendNumLet, e.g. the underscore character:
A:B_A::B <- left intact, but should output "A:B_A", "B"
By contrast, the current UAX29URLEmailTokenizer gets the above right.
In the JFlex 1.5.0 release, I added the ability to include external files into the rules section of the scanner specification, and I want to take advantage of this to refactor StandardTokenizer and UAX29URLEmailTokenizer so that there is only one definition of the shared rules. (That would have prevented the problem for which I'm reopening this issue.) I'll make a separate issue for that.