Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.2-incubating
-
None
-
AWS ubuntu 12.04 oracle java7
Description
On some machines UTF-8 gets corrupted over the multilang protocol. Analysis of the problem leads to JsonSerializer usage of InputStreamReader when reading from stdin.
InputStreamReader uses the JVM defaults, which is usually UTF-8 but not always.
Temporary Workaround:
Edit storm/conf/storm.yaml and enforce the default JVM charset as follows:
worker.childopts: "-Xmx768m -Dfile.encoding=UTF-8"
Required Fix in JsonSerializer:
Pass the string "UTF-8" to the InputStreamReader constructor as second argument.
Notes:
The implementation already enforces UTF-8 when writing to stdout, so there is no other fix needed there.
python simplejson and ruby json gem use UTF-8 as the default.