Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.8.0, 0.9.0
-
None
-
None
-
None
Description
YarnJob passes the Config to the AM via an environment variable, SAMZA_CONFIG. After serializing the Config to JSON, it goes through Util.envVarEscape(), which I think is behaving improperly. Specifically, that method escapes single quotes globally, even inside double quotes. Consider the following config property:
expression="type == 'LINEAR'"
After encoding to JSON this looks like this:
{"expression":"type == 'LINEAR'"}
And after being run though Util.envVarEscape():
{\"expression\":\"type == \'LINEAR\'\"}
I presume these values are being escaped because the YARN client is passing them through the shell at some point. But the escaping is too simplistic; single quotes should not be escaped within double quotes. As a result, the value arrives at the AppMaster as follows:
{"expression": "type == \'LINEAR\'"}
At which point Jackson chokes on it because \' is invalid JSON (invalid escape sequence):
Exception in thread "main" org.codehaus.jackson.JsonParseException: Unrecognized character escape ''' (code 39) at [Source: java.io.StringReader@1b6e1eff; line: 1, column: 2814] at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) at org.codehaus.jackson.impl.JsonParserMinimalBase._handleUnrecognizedCharacterEscape(JsonParserMinimalBase.java:496) at org.codehaus.jackson.impl.ReaderBasedParser._decodeEscaped(ReaderBasedParser.java:1606) at org.codehaus.jackson.impl.ReaderBasedParser._finishString2(ReaderBasedParser.java:1353) at org.codehaus.jackson.impl.ReaderBasedParser._finishString(ReaderBasedParser.java:1330) at org.codehaus.jackson.impl.ReaderBasedParser.getText(ReaderBasedParser.java:200) at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:59) at org.codehaus.jackson.map.deser.std.MapDeserializer._readAndBind(MapDeserializer.java:319) at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:249) at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:33) at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732) at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1863) at org.apache.samza.config.serializers.JsonConfigSerializer$.fromJson(JsonConfigSerializer.scala:34) at org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:72) at org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)
This is particularly nasty since I don't see a way for any quotes, single or double to get passed to the job successfully and remain intact. I know the way this config is passed has undergone some change but I don't know the details so wanted to get this issue on record.