Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: search
    • Labels:
      None
    • Environment:

      Darwin berlin.local 8.7.1 Darwin Kernel Version 8.7.1: Wed Jun 7 16:19:56 PDT 2006; root:xnu-792.9.72.obj~2/RELEASE_I386 i386 i386

      Description

      Tricia Williams reported problems with Cyrillic charsets when trying to search using the admin application, specifically NPEs and bad results.

      This patch fixes the webapp by specifying a character encoding for each of the admin pages.

      I also discovered a second issue in StrUtils that wasn't encoding UTF-8 data properly, so I fixed that. I'm attaching 2 patches.

      1. patch-utf-8-problems.patch
        13 kB
        Philip Jacob
      2. patch-utf-8-problems-webapp.patch
        15 kB
        Philip Jacob

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          Thanks Phil,

          Please don't make formatting changes in patches that make other changes though... it makes it very hard to see what has changed.

          Here is what it looks like these patches do:

          • admin JSP pages: changes charset
          • adds a space between the mime-type and charset
            Example: "text/xml;charset=UTF-8" => "text/xml; charset=UTF-8"
            Does this fix a bug, or is it just better style?
          • when logging requests, replaces partialURLEncodeVal with URLEncoder.encode()
            one reason I didn't use this originally was that I wanted queries easier to read... encode() is both slower and encodes many things that don't need to be encoded.
            I would much rather see q=title:"solr+search" in the log, not q=%22title%3Asolr%20search%22
          Show
          Yonik Seeley added a comment - Thanks Phil, Please don't make formatting changes in patches that make other changes though... it makes it very hard to see what has changed. Here is what it looks like these patches do: admin JSP pages: changes charset adds a space between the mime-type and charset Example: "text/xml;charset=UTF-8" => "text/xml; charset=UTF-8" Does this fix a bug, or is it just better style? when logging requests, replaces partialURLEncodeVal with URLEncoder.encode() one reason I didn't use this originally was that I wanted queries easier to read... encode() is both slower and encodes many things that don't need to be encoded. I would much rather see q=title:"solr+search" in the log, not q=%22title%3Asolr%20search%22
          Hide
          Philip Jacob added a comment -

          Hey Yonik,

          Correct on the admin pages. I specified UTF-8 for everything.

          The additional space after the semicolon in "text/xml; charset=UTF-8" is out of compliance. See section 14.17 of HTTP/1.1:

          http://www.ietf.org/rfc/rfc2616.txt

          It's a small issue, but I noticed it and figured that I'd fix it.

          Using partialURLEncodeVal actually does cause bugs. The querystring is written into the logfiles and when you use UTF-8 data in the 'q' parameter, it isn't escaped properly. So while it may be slower, it in fact results in correct output being written by the logger.

          Show
          Philip Jacob added a comment - Hey Yonik, Correct on the admin pages. I specified UTF-8 for everything. The additional space after the semicolon in "text/xml; charset=UTF-8" is out of compliance. See section 14.17 of HTTP/1.1: http://www.ietf.org/rfc/rfc2616.txt It's a small issue, but I noticed it and figured that I'd fix it. Using partialURLEncodeVal actually does cause bugs. The querystring is written into the logfiles and when you use UTF-8 data in the 'q' parameter, it isn't escaped properly. So while it may be slower, it in fact results in correct output being written by the logger.
          Hide
          Yonik Seeley added a comment -

          I've committed most of the changes except the logging change, which I opened SOLR-36 for.
          It does seem like it would be useful to percent encode non-ascii chars when logging the query params.

          Thanks!

          Show
          Yonik Seeley added a comment - I've committed most of the changes except the logging change, which I opened SOLR-36 for. It does seem like it would be useful to percent encode non-ascii chars when logging the query params. Thanks!
          Hide
          Hoss Man added a comment -

          This bug was modified as part of a bulk update using the criteria...

          • Marked ("Resolved" or "Closed") and "Fixed"
          • Had no "Fix Version" versions
          • Was listed in the CHANGES.txt for 1.1

          The Fix Version for all 38 issues found was set to 1.1, email notification
          was suppressed to prevent excessive email.

          For a list of all the issues modified, search jira comments for this
          (hopefully) unique string: 20080415hossman3

          Show
          Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked ("Resolved" or "Closed") and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.1 The Fix Version for all 38 issues found was set to 1.1, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: 20080415hossman3

            People

            • Assignee:
              Unassigned
              Reporter:
              Philip Jacob
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development