Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-700

YarnJob mangles config properties containing quotes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.0, 0.9.0
    • None
    • None
    • None

    Description

      YarnJob passes the Config to the AM via an environment variable, SAMZA_CONFIG. After serializing the Config to JSON, it goes through Util.envVarEscape(), which I think is behaving improperly. Specifically, that method escapes single quotes globally, even inside double quotes. Consider the following config property:

      expression="type == 'LINEAR'"
      

      After encoding to JSON this looks like this:

      {"expression":"type == 'LINEAR'"}
      

      And after being run though Util.envVarEscape():

      {\"expression\":\"type == \'LINEAR\'\"}
      

      I presume these values are being escaped because the YARN client is passing them through the shell at some point. But the escaping is too simplistic; single quotes should not be escaped within double quotes. As a result, the value arrives at the AppMaster as follows:

      {"expression": "type == \'LINEAR\'"}
      

      At which point Jackson chokes on it because \' is invalid JSON (invalid escape sequence):

      Exception in thread "main" org.codehaus.jackson.JsonParseException: Unrecognized character escape ''' (code 39)
       at [Source: java.io.StringReader@1b6e1eff; line: 1, column: 2814]
              at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
              at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
              at org.codehaus.jackson.impl.JsonParserMinimalBase._handleUnrecognizedCharacterEscape(JsonParserMinimalBase.java:496)
              at org.codehaus.jackson.impl.ReaderBasedParser._decodeEscaped(ReaderBasedParser.java:1606)
              at org.codehaus.jackson.impl.ReaderBasedParser._finishString2(ReaderBasedParser.java:1353)
              at org.codehaus.jackson.impl.ReaderBasedParser._finishString(ReaderBasedParser.java:1330)
              at org.codehaus.jackson.impl.ReaderBasedParser.getText(ReaderBasedParser.java:200)
              at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:59)
              at org.codehaus.jackson.map.deser.std.MapDeserializer._readAndBind(MapDeserializer.java:319)
              at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:249)
              at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:33)
              at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732)
              at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1863)
              at org.apache.samza.config.serializers.JsonConfigSerializer$.fromJson(JsonConfigSerializer.scala:34)
              at org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:72)
              at org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)
      

      This is particularly nasty since I don't see a way for any quotes, single or double to get passed to the job successfully and remain intact. I know the way this config is passed has undergone some change but I don't know the details so wanted to get this issue on record.

      Attachments

        Activity

          People

            Unassigned Unassigned
            twbecker Tommy Becker
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: