Uploaded image for project: 'TOREE'
  1. TOREE
  2. TOREE-409

Signature Mismatch between jupyter_client and Apache Toree

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Kernel
    • None

    Description

      I am encountering a signature mismatch that is caused when using unicode characters > 128

      The environment I am currently in:

      On an x/86 system, I have a jupyter notebook server running. On a z/OS system, I have a Jupyter Kernel Gateway running. The notebook server connects to the kernel gateway which gives the server access to Apache Toree (what is currently in master branch from about two weeks ago) . Apache Toree then interacts with the Spark cluster (2.0.2) The scala version is 2.11.8. The python version I am using is Python 3.6 (although I also tried this with python 2.7 and I also ran into the same problem)

      This error occurs when I use either python 2.7 and python 3.6

      In a jupyter notebook I try running the following:

      print("่’„")

      The notebook hangs up and the following is produced from the kernel gateway logs:

      [E 170413 15:16:27 web:1548] Uncaught exception GET /api/kernels/e6f0c109-d3b2-4254-85a6-1eea95f7175b/channels (9.12.41.240)
      HTTPServerRequest(protocol='http', host='9.12.41.72:9099', method='GET', uri='/api/kernels/e6f0c109-d3b2-4254-85a6-1eea95f7175b/channels', version='HTTP/1.1', remote_ip='9.12.41.240', headers=

      {'Upgrade': 'websocket', 'Accept-Encoding': 'gzip', 'Sec-Websocket-Version': '13', 'Connection': 'Upgrade', 'Sec-Websocket-Key': 'evzOnn7Up3BD/6Grb87mCQ==', 'Host': '9.12.41.72:9099', 'Authorization': 'token commander'}

      )
      Traceback (most recent call last):
      File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/tornado/web.py", line 1425, in _stack_context_handle_exception
      raise_exc_info((type, value, traceback))
      File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/tornado/stack_context.py", line 314, in wrapped
      ret = fn(*args, **kwargs)
      File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 191, inย 
      self.on_recv(lambda msg: callback(self, msg), copy=copy)
      File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/jupyter_kernel_gateway-1.2.1-py2.7.egg/kernel_gateway/services/kernels/handlers.py", line 172, in _on_zmq_reply
      super(ZMQChannelsHandler, self)._on_zmq_reply(stream, msg_list)
      File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/notebook/services/kernels/handlers.py", line 296, in _on_zmq_reply
      msg = self.session.deserialize(fed_msg_list)
      File "/Voyager/Hamlet/python/python-2017-04-12-py27/python27/lib/python2.7/site-packages/jupyter_client/session.py", line 859, in deserialize
      raise ValueError("Invalid Signature: %r" % signature)
      ValueError: Invalid Signature: '4324e46ac9c58336e781be2bff631fb7e3019f1ce58f5795544a8d54cdfa0f0a'

      Upon further investigation, I wanted to see the messages that were being received by the zmq socket and what was being sent to the zmq socket. Here is what I found when running the cell with print("่’„"):

      The CONTENT STRING that is received by zmq socket in hexadecimal:

      7b22636f6465223a227072696e74285c223f5c2229222c22657865637574696f6e5f636f756e74223a317d

      which is :

      {"code":"print(\\"?\\")","execution_count":1}

      Notice the "3f" 3f is "?" in utf-8 encoding.

      The CONTENT STRING that is being SIGNED by Apache Toree over the zmq socket in hexadecimal:

      7b22636f6465223a227072696e74285c22e892845c2229222c22657865637574696f6e5f636f756e74223a317d

      Now, if you compare both this hexadecimal string and the one that is being received by zmq socket it is different! The difference is the "3f" in what is being received and the "e89284" in what is being signed. Note that e89284 equates to ่’„ in utf-8 encoding.

      The STRING that is being SENT by Apache Toree over the zmq socket in hexadecimal:

      536f6d65285b2033666438623739382d303663312d343634322d616130392d6635636232633664326636662c203c4944537c4d53473e2c20343763333766326264366161316663353335636230343466313331663838313861343462343164383066306463643332316239343934386239333561303135642c207b226d73675f6964223a2266633934623732612d646466642d343263372d386230662d643034626561386533616530222c22757365726e616d65223a2254414e4759222c2273657373696f6e223a2231663061366439622d656431642d343132312d623566342d386330366163613939323261222c226d73675f74797065223a22657865637574655f696e707574222c2276657273696f6e223a22352e30227d2c207b226d73675f6964223a224342463035463446323945443430323538423835323336373644383937393130222c22757365726e616d65223a22757365726e616d65222c2273657373696f6e223a224346443534443639324334303446463138333330324142424238433431323533222c226d73675f74797065223a22657865637574655f72657175657374222c2276657273696f6e223a22352e30227d2c207b2274696d657374616d70223a2231343934383736343036353534227d2c207b22636f6465223a227072696e74285c223f5c2229222c22657865637574696f6e5f636f756e74223a317d205d29

      In english that equates to:

      Some([ 3fd8b798-06c1-4642-aa09-f5cb2c6d2f6f,<IDS|MSG>,47c37f2bd6aa1fc535cb044f131f8818a44b41d80f0dcd321b94948b935a015d,

      {"msg_id":"fc94b72a-ddfd-42c7-8b0f-d04bea8e3ae0","username":"TANGY","session":"1f0a6d9b-ed1d-4121-b5f4-8c06aca9922a","msg_type":"execute_input","version":"5.0"}

      ,

      {"msg_id":"CBF05F4F29ED40258B8523676D897910","username":"username","session":"CFD54D692C404FF183302ABBB8C41253","msg_type":"execute_request","version":"5.0"}

      ,

      {"timestamp":"1494876406554"}

      ,

      {"code":"print(\"?\")","execution_count":1}

      ])

      The part that is interesting to me is the content string, I parsed out the content string of the hexadecimal message above:

      7b22636f6465223a227072696e74285c223f5c2229222c22657865637574696f6e5f636f756e74223a317d

      This is where I'm guessing the invalid mismatch occurs. The content string that apache toree is signing off on is different from the content string that is is sending over. Notice that the content string that is being sent over is exactly the same as the content string that is being received by zmq socket (both have the invalid 3f)

      This is where I put my debug statements in case it matters:

      communication/src/main/scala/org/apache/toree/communication/socket/ZeroMQSocketRunnable.scala:

      /**

      • Sends the next outbound message from the outbound message queue.
        *
      • @param socket The socket to use when sending the message
        *
      • @return True if a message was sent, otherwise false
        */
        protected def processNextOutboundMessage(socket: ZMQ.Socket): Boolean = {
        val message = Option(outboundMessages.poll())
        if (message != None){
        logger.warn(s"Message that is SENT IN HEX:" + String.format("%040x", new BigInteger(1, s"${message}".getBytes(StandardCharsets.UTF_8))))
        logger.warn(s"Message that is SENT:" + s"${message}")
        }
        message.foreach(_.send(socket))

      message.nonEmpty
      }

      And then also in:

      communication/src/main/scala/org/apache/toree/communication/security/SignatureProducerActor.scala

      class SignatureProducerActor(
      private val hmac: Hmac
      ) extends Actor with LogLike with OrderedSupport {
      override def receive: Receive = {
      case message: KernelMessage => withProcessing {
      logger.warn(s"Message that is being signed (HEADER):" + s"${message.header}")
      logger.warn(s"Message that is being signed (PARENT HEADER):" + s"${message.parentHeader}")
      logger.warn(s"Message that is being signed (METADATA):" + s"${message.metadata}")
      logger.warn(s"Message that is being signed IN HEX (CONTENT STRING):" + String.format("%040x", new BigInteger(1, s"${message.contentString}".getBytes(StandardCharsets.UTF_8))))
      logger.warn(s"Message that is being signed (CONTENT STRING):" + s"${message.contentString}")
      val signature = hmac(
      Json.stringify(Json.toJson(message.header)),
      Json.stringify(Json.toJson(message.parentHeader)),
      Json.stringify(Json.toJson(message.metadata)),
      message.contentString
      )
      sender ! signature
      }
      }

      Also something else I noticed was when I ran jupyter notebook/toree from source (make dev) I noticed that in the message, the hexadecimal representation of the content string gets sent over as opposed to the string itself i.e.

      Some([ 9FF3E30DB4AD4ED2B0C6795A5AF321A6, <IDS|MSG>,fd19b14775db834185f1fafd1d22061a903898db98b25582700de5230a85c9c4,

      {"msg_id":"4c18b424-4d18-4f3e-bddb-b035c638ab7e","username":"root","session":"2142f120-8287-4723-9d0e-05d85260fb0b","msg_type":"execute_input","version":"5.0"}

      ,

      {"msg_id":"D872D7431AF941C4865B9D255CB01A5A","username":"username","session":"9FF3E30DB4AD4ED2B0C6795A5AF321A6","msg_type":"execute_request","version":"5.0"}

      ,

      {"timestamp":"1494877223980"}

      , 7B22636F6465223A227072696E74285C22E892845C2229222C22657865637574696F6E5F636F756E74223A317D ])

      In my case I see this:

      Some([ 3fd8b798-06c1-4642-aa09-f5cb2c6d2f6f,<IDS|MSG>,47c37f2bd6aa1fc535cb044f131f8818a44b41d80f0dcd321b94948b935a015d,

      {"msg_id":"fc94b72a-ddfd-42c7-8b0f-d04bea8e3ae0","username":"TANGY","session":"1f0a6d9b-ed1d-4121-b5f4-8c06aca9922a","msg_type":"execute_input","version":"5.0"}

      ,

      {"msg_id":"CBF05F4F29ED40258B8523676D897910","username":"username","session":"CFD54D692C404FF183302ABBB8C41253","msg_type":"execute_request","version":"5.0"}

      ,

      {"timestamp":"1494876406554"}

      ,

      {"code":"print(\"?\")","execution_count":1}

      ])

      Thanks for the help

      Attachments

        Activity

          People

            Unassigned Unassigned
            tangy327 Yunli Tang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: