Details
-
Bug
-
Status: Resolved
-
P3
-
Resolution: Fixed
-
None
Description
Regex used in splitting words ([A-Za-z\']+) only works on latin input, change it to make it work on non-latin inputs.
For example, see Java version:
https://github.com/apache/beam/blob/367fcb28d544934797d25cb34d54136b2d7d6e99/examples/java/src/main/java/org/apache/beam/examples/common/ExampleUtils.java#L75