Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
On an x/86 system, I have a jupyter notebook server running. On a z/OS system, I have a Jupyter Kernel Gateway running. The notebook server connects to the kernel gateway which gives the server access to Apache Toree (what is currently in master branch from about two weeks ago) . Apache Toree then interacts with the Spark cluster (2.0.2) The scala version is 2.11.8. The python version I am using is Python 3.6 (although I also tried this with python 2.7 and I also ran into the same problem)On an x/86 system, I have a jupyter notebook server running. On a z/OS system, I have a Jupyter Kernel Gateway running. The notebook server connects to the kernel gateway which gives the server access to Apache Toree (what is currently in master branch from about two weeks ago) . Apache Toree then interacts with the Spark cluster (2.0.2) The scala version is 2.11.8. The python version I am using is Python 3.6 (although I also tried this with python 2.7 and I also ran into the same problem)
Description
I am encountering a signature mismatch that is caused when using unicode characters > 128
The environment I am currently in:
On an x/86 system, I have a jupyter notebook server running. On a z/OS system, I have a Jupyter Kernel Gateway running. The notebook server connects to the kernel gateway which gives the server access to Apache Toree (what is currently in master branch from about two weeks ago) . Apache Toree then interacts with the Spark cluster (2.0.2) The scala version is 2.11.8. The python version I am using is Python 3.6 (although I also tried this with python 2.7 and I also ran into the same problem)
This error occurs when I use either python 2.7 and python 3.6
In a jupyter notebook I try running the following:
print("่")
The notebook hangs up and the following is produced from the kernel gateway logs:
[E 170413 15:16:27 web:1548] Uncaught exception GET /api/kernels/e6f0c109-d3b2-4254-85a6-1eea95f7175b/channels (9.12.41.240)
HTTPServerRequest(protocol='http', host='9.12.41.72:9099', method='GET', uri='/api/kernels/e6f0c109-d3b2-4254-85a6-1eea95f7175b/channels', version='HTTP/1.1', remote_ip='9.12.41.240', headers=
)
Traceback (most recent call last):
File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/tornado/web.py", line 1425, in _stack_context_handle_exception
raise_exc_info((type, value, traceback))
File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 191, inย
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/jupyter_kernel_gateway-1.2.1-py2.7.egg/kernel_gateway/services/kernels/handlers.py", line 172, in _on_zmq_reply
super(ZMQChannelsHandler, self)._on_zmq_reply(stream, msg_list)
File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/notebook/services/kernels/handlers.py", line 296, in _on_zmq_reply
msg = self.session.deserialize(fed_msg_list)
File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/jupyter_client/session.py", line 859, in deserialize
raise ValueError("Invalid Signature: %r" % signature)
ValueError: Invalid Signature: '4324e46ac9c58336e781be2bff631fb7e3019f1ce58f5795544a8d54cdfa0f0a'
Upon further investigation, I wanted to see the messages that were being received by the zmq socket and what was being sent to the zmq socket. Here is what I found when running the cell with print("่"):
The CONTENT STRING that is received by zmq socket in hexadecimal:
7b22636f6465223a227072696e74285c223f5c2229222c22657865637574696f6e5f636f756e74223a317d
which is :
{"code":"print(\\"?\\")","execution_count":1}Notice the "3f" 3f is "?" in utf-8 encoding.
The CONTENT STRING that is being SIGNED by Apache Toree over the zmq socket in hexadecimal:
7b22636f6465223a227072696e74285c22e892845c2229222c22657865637574696f6e5f636f756e74223a317d
Now, if you compare both this hexadecimal string and the one that is being received by zmq socket it is different! The difference is the "3f" in what is being received and the "e89284" in what is being signed. Note that e89284 equates to ่ in utf-8 encoding.
The STRING that is being SENT by Apache Toree over the zmq socket in hexadecimal:
536f6d65285b2033666438623739382d303663312d343634322d616130392d6635636232633664326636662c203c4944537c4d53473e2c20343763333766326264366161316663353335636230343466313331663838313861343462343164383066306463643332316239343934386239333561303135642c207b226d73675f6964223a2266633934623732612d646466642d343263372d386230662d643034626561386533616530222c22757365726e616d65223a2254414e4759222c2273657373696f6e223a2231663061366439622d656431642d343132312d623566342d386330366163613939323261222c226d73675f74797065223a22657865637574655f696e707574222c2276657273696f6e223a22352e30227d2c207b226d73675f6964223a224342463035463446323945443430323538423835323336373644383937393130222c22757365726e616d65223a22757365726e616d65222c2273657373696f6e223a224346443534443639324334303446463138333330324142424238433431323533222c226d73675f74797065223a22657865637574655f72657175657374222c2276657273696f6e223a22352e30227d2c207b2274696d657374616d70223a2231343934383736343036353534227d2c207b22636f6465223a227072696e74285c223f5c2229222c22657865637574696f6e5f636f756e74223a317d205d29
In english that equates to:
Some([ 3fd8b798-06c1-4642-aa09-f5cb2c6d2f6f,<IDS|MSG>,47c37f2bd6aa1fc535cb044f131f8818a44b41d80f0dcd321b94948b935a015d,
{"msg_id":"fc94b72a-ddfd-42c7-8b0f-d04bea8e3ae0","username":"TANGY","session":"1f0a6d9b-ed1d-4121-b5f4-8c06aca9922a","msg_type":"execute_input","version":"5.0"},
{"msg_id":"CBF05F4F29ED40258B8523676D897910","username":"username","session":"CFD54D692C404FF183302ABBB8C41253","msg_type":"execute_request","version":"5.0"},
{"timestamp":"1494876406554"},
{"code":"print(\"?\")","execution_count":1}])
The part that is interesting to me is the content string, I parsed out the content string of the hexadecimal message above:
7b22636f6465223a227072696e74285c223f5c2229222c22657865637574696f6e5f636f756e74223a317d
This is where I'm guessing the invalid mismatch occurs. The content string that apache toree is signing off on is different from the content string that is is sending over. Notice that the content string that is being sent over is exactly the same as the content string that is being received by zmq socket (both have the invalid 3f)
This is where I put my debug statements in case it matters:
communication/src/main/scala/org/apache/toree/communication/socket/ZeroMQSocketRunnable.scala:
/**
- Sends the next outbound message from the outbound message queue.
* - @param socket The socket to use when sending the message
* - @return True if a message was sent, otherwise false
*/
protected def processNextOutboundMessage(socket: ZMQ.Socket): Boolean = {
val message = Option(outboundMessages.poll())
if (message != None){
logger.warn(s"Message that is SENT IN HEX:" + String.format("%040x", new BigInteger(1, s"${message}".getBytes(StandardCharsets.UTF_8))))
logger.warn(s"Message that is SENT:" + s"${message}")
}
message.foreach(_.send(socket))
message.nonEmpty
}
And then also in:
communication/src/main/scala/org/apache/toree/communication/security/SignatureProducerActor.scala
class SignatureProducerActor(
private val hmac: Hmac
) extends Actor with LogLike with OrderedSupport {
override def receive: Receive = {
case message: KernelMessage => withProcessing {
logger.warn(s"Message that is being signed (HEADER):" + s"${message.header}")
logger.warn(s"Message that is being signed (PARENT HEADER):" + s"${message.parentHeader}")
logger.warn(s"Message that is being signed (METADATA):" + s"${message.metadata}")
logger.warn(s"Message that is being signed IN HEX (CONTENT STRING):" + String.format("%040x", new BigInteger(1, s"${message.contentString}".getBytes(StandardCharsets.UTF_8))))
logger.warn(s"Message that is being signed (CONTENT STRING):" + s"${message.contentString}")
val signature = hmac(
Json.stringify(Json.toJson(message.header)),
Json.stringify(Json.toJson(message.parentHeader)),
Json.stringify(Json.toJson(message.metadata)),
message.contentString
)
sender ! signature
}
}
Also something else I noticed was when I ran jupyter notebook/toree from source (make dev) I noticed that in the message, the hexadecimal representation of the content string gets sent over as opposed to the string itself i.e.
Some([ 9FF3E30DB4AD4ED2B0C6795A5AF321A6, <IDS|MSG>,fd19b14775db834185f1fafd1d22061a903898db98b25582700de5230a85c9c4,
{"msg_id":"4c18b424-4d18-4f3e-bddb-b035c638ab7e","username":"root","session":"2142f120-8287-4723-9d0e-05d85260fb0b","msg_type":"execute_input","version":"5.0"},
{"msg_id":"D872D7431AF941C4865B9D255CB01A5A","username":"username","session":"9FF3E30DB4AD4ED2B0C6795A5AF321A6","msg_type":"execute_request","version":"5.0"},
{"timestamp":"1494877223980"}, 7B22636F6465223A227072696E74285C22E892845C2229222C22657865637574696F6E5F636F756E74223A317D ])
In my case I see this:
Some([ 3fd8b798-06c1-4642-aa09-f5cb2c6d2f6f,<IDS|MSG>,47c37f2bd6aa1fc535cb044f131f8818a44b41d80f0dcd321b94948b935a015d,
{"msg_id":"fc94b72a-ddfd-42c7-8b0f-d04bea8e3ae0","username":"TANGY","session":"1f0a6d9b-ed1d-4121-b5f4-8c06aca9922a","msg_type":"execute_input","version":"5.0"},
{"msg_id":"CBF05F4F29ED40258B8523676D897910","username":"username","session":"CFD54D692C404FF183302ABBB8C41253","msg_type":"execute_request","version":"5.0"},
{"timestamp":"1494876406554"},
{"code":"print(\"?\")","execution_count":1}])
Thanks for the help