Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
0.24.1, 0.26.0
-
Mesosphere Sprint 21
-
2
Description
Spark encodes some binary data into the ExecutorInfo.data field. This field is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
If you have such a field, it seems that it is splatted out into JSON without any regards to proper character encoding:
0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".| 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u0000\u0005ur\| 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u0000\u000f[Lsca| 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00|
I suspect this is because the HTTP api emits the executorInfo.data directly:
JSON::Object model(const ExecutorInfo& executorInfo) { JSON::Object object; object.values["executor_id"] = executorInfo.executor_id().value(); object.values["name"] = executorInfo.name(); object.values["data"] = executorInfo.data(); object.values["framework_id"] = executorInfo.framework_id().value(); object.values["command"] = model(executorInfo.command()); object.values["resources"] = model(executorInfo.resources()); return object; }
I think this may be because the custom JSON processing library in stout seems to not have any idea of what a byte array is. I'm guessing that some implicit conversion makes it get written as a String instead, but:
inline std::ostream& operator<<(std::ostream& out, const String& string) { // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. // See RFC4627 for the JSON string specificiation. return out << picojson::value(string.value).serialize(); }
Thank you for any assistance here. Our cluster is currently entirely down – the frameworks cannot handle parsing the invalid JSON produced (it is not even valid utf-8)
Attachments
Issue Links
- is related to
-
MESOS-3794 Master should not store arbitrarily sized data in ExecutorInfo.
- Accepted
- relates to
-
MESOS-4642 Mesos Agent Json API can dump binary data from log files out as invalid JSON.
- Accepted