Solr
  1. Solr
  2. SOLR-2520

JSONResponseWriter w/json.wrf can produce invalid javascript depending on unicode chars in response data

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 3.2
    • Component/s: None
    • Labels:
      None

      Description

      Please see http://timelessrepo.com/json-isnt-a-javascript-subset.

      If a stored field contains Unicode characters that are valid in Json but not valid in Javascript, and you use the query option to ask for JSONP (json.wrf), solr does not escape them, resulting in content that explodes on contact with browsers. That is, there are certain Unicode characters that are valid JSON but invalid in Javascript source, and a JSONP response is javascript source, to be incorporated in an HTML script tag. Further investigation suggests that only one character is a problem here: U+2029 must be represented as \u2029 instead of left 'as-is'.

      1. SOLR-2520.patch
        2 kB
        Yonik Seeley

        Activity

        Hide
        Robert Muir added a comment -

        Bulk close for 3.2

        Show
        Robert Muir added a comment - Bulk close for 3.2
        Hide
        Yonik Seeley added a comment -

        Committed to trunk and 3x.
        Thanks for bringing this to our attention Benson!

        Show
        Yonik Seeley added a comment - Committed to trunk and 3x. Thanks for bringing this to our attention Benson!
        Hide
        Yonik Seeley added a comment -

        Here's a patch w/ simple test.

        Show
        Yonik Seeley added a comment - Here's a patch w/ simple test.
        Hide
        Benson Margulies added a comment -

        Yes, that looks like that.

        Show
        Benson Margulies added a comment - Yes, that looks like that.
        Hide
        Yonik Seeley added a comment -

        It looks like we already escape \u2028 (see SOLR-1936), so we should just do the same for \u2029?

        Show
        Yonik Seeley added a comment - It looks like we already escape \u2028 (see SOLR-1936 ), so we should just do the same for \u2029?
        Hide
        Benson Margulies added a comment -

        I'd vote for the later. I assume that there is some large inventory of people who are currently using json.wrf=foo and who would benefit from the change. However, I have limited context here, so if anyone else knows more about how users are using this stuff I hope they will speak up. Sorry not to have been fully clear on the first attempt.

        Show
        Benson Margulies added a comment - I'd vote for the later. I assume that there is some large inventory of people who are currently using json.wrf=foo and who would benefit from the change. However, I have limited context here, so if anyone else knows more about how users are using this stuff I hope they will speak up. Sorry not to have been fully clear on the first attempt.
        Hide
        Hoss Man added a comment -

        Benson: thanks for the clarification, i've updated the summary to attempt to clarify the root of the issue.

        Would make more sense to have a "JavascriptResponseWriter" or to have the JSONResponseWriter do unicode escaping/stripping if/when json.wrf is specified?

        Show
        Hoss Man added a comment - Benson: thanks for the clarification, i've updated the summary to attempt to clarify the root of the issue. Would make more sense to have a "JavascriptResponseWriter" or to have the JSONResponseWriter do unicode escaping/stripping if/when json.wrf is specified?
        Hide
        Benson Margulies added a comment -

        Fun happens when you specify something in json.wrf. This demands 'jsonp' instead of json, which results in the result being treated as javascript, not json. wt=json&json.wrf=SOME_PREFIX will cause Solr to respond with

        SOME_PREFIX(

        {whatever it was otherwise going to return})

        instead of just

        {whatever it was otherwise going to return}

        If there is then an interesting Unicode character in there, Chrome implodes and firefox quietly rejects.

        Show
        Benson Margulies added a comment - Fun happens when you specify something in json.wrf. This demands 'jsonp' instead of json, which results in the result being treated as javascript, not json. wt=json&json.wrf=SOME_PREFIX will cause Solr to respond with SOME_PREFIX( {whatever it was otherwise going to return}) instead of just {whatever it was otherwise going to return} If there is then an interesting Unicode character in there, Chrome implodes and firefox quietly rejects.
        Hide
        Hoss Man added a comment -

        I'm confused here: As far as i can tell, the JSONResponseWriter does in fact output valid JSON (the link mentioned points out that there are control characters valid in JSON which are not valid in javascript, but that's what the response writer produces – JSON) ... so what is the bug?

        And what do you mean by "the query option to ask for jsonp" ? ... i don't see that option in the JSONResponseWriter

        (is this bug about some third party response writer?)

        Show
        Hoss Man added a comment - I'm confused here: As far as i can tell, the JSONResponseWriter does in fact output valid JSON (the link mentioned points out that there are control characters valid in JSON which are not valid in javascript, but that's what the response writer produces – JSON) ... so what is the bug? And what do you mean by "the query option to ask for jsonp" ? ... i don't see that option in the JSONResponseWriter (is this bug about some third party response writer?)

          People

          • Assignee:
            Unassigned
            Reporter:
            Benson Margulies
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development