Lucene - Core
  1. Lucene - Core
  2. LUCENE-5107

Convert all Properties#store() and load() to use UTF-8 charset

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Followup of LUCENE-5106: This needs to be changed and the forbidden signatures changed to disallow InputStream/OutputStream and allow Reader/Writer only.

      1. LUCENE-5107-4.4.patch
        26 kB
        Uwe Schindler
      2. LUCENE-5107.patch
        22 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment - - edited

          In Lucene/Solr 4.5 we only allow the UTF-8 encoded properties files, so Reader/Writer throughout the code! This allows to still read files written by 4.4 and before with unicode-escapes (the Reader load(Reader) method decodes unicode-escaped, too). In fact, files written by the InputStream API are US-ASCII only, with everything >127 escaped (see src.zip) - so can also be loaded by an UTF-8 decoder, so the change breaks no existing files.

          Show
          Uwe Schindler added a comment - - edited In Lucene/Solr 4.5 we only allow the UTF-8 encoded properties files, so Reader/Writer throughout the code! This allows to still read files written by 4.4 and before with unicode-escapes (the Reader load(Reader) method decodes unicode-escaped, too). In fact, files written by the InputStream API are US-ASCII only, with everything >127 escaped (see src.zip) - so can also be loaded by an UTF-8 decoder, so the change breaks no existing files.
          Hide
          Uwe Schindler added a comment -

          Here is the patch which preserves full backwards compatibility with properties files written by earlier solr/lucene versions.

          But it now allows to put UTF-8 directly into properties files and it no longer \u-escapes stuff when writing out.

          Show
          Uwe Schindler added a comment - Here is the patch which preserves full backwards compatibility with properties files written by earlier solr/lucene versions. But it now allows to put UTF-8 directly into properties files and it no longer \u-escapes stuff when writing out.
          Hide
          ASF subversion and git services added a comment -

          Commit 1502615 from Uwe Schindler
          [ https://svn.apache.org/r1502615 ]

          LUCENE-5107: Properties files by Lucene are now written in UTF-8 encoding, Unicode is no longer escaped. Reading of legacy properties files with \u escapes is still possible

          Show
          ASF subversion and git services added a comment - Commit 1502615 from Uwe Schindler [ https://svn.apache.org/r1502615 ] LUCENE-5107 : Properties files by Lucene are now written in UTF-8 encoding, Unicode is no longer escaped. Reading of legacy properties files with \u escapes is still possible
          Hide
          ASF subversion and git services added a comment -

          Commit 1502622 from Uwe Schindler
          [ https://svn.apache.org/r1502622 ]

          Merged revision(s) 1502615 from lucene/dev/trunk:
          LUCENE-5107: Properties files by Lucene are now written in UTF-8 encoding, Unicode is no longer escaped. Reading of legacy properties files with \u escapes is still possible

          Show
          ASF subversion and git services added a comment - Commit 1502622 from Uwe Schindler [ https://svn.apache.org/r1502622 ] Merged revision(s) 1502615 from lucene/dev/trunk: LUCENE-5107 : Properties files by Lucene are now written in UTF-8 encoding, Unicode is no longer escaped. Reading of legacy properties files with \u escapes is still possible
          Hide
          Uwe Schindler added a comment -

          Patch for 4.4 (as code in Solr is little different).

          Show
          Uwe Schindler added a comment - Patch for 4.4 (as code in Solr is little different).
          Hide
          ASF subversion and git services added a comment -

          Commit 1502632 from Uwe Schindler
          [ https://svn.apache.org/r1502632 ]

          Merged revision(s) 1502615 from lucene/dev/trunk:
          LUCENE-5107: Properties files by Lucene are now written in UTF-8 encoding, Unicode is no longer escaped. Reading of legacy properties files with \u escapes is still possible

          Show
          ASF subversion and git services added a comment - Commit 1502632 from Uwe Schindler [ https://svn.apache.org/r1502632 ] Merged revision(s) 1502615 from lucene/dev/trunk: LUCENE-5107 : Properties files by Lucene are now written in UTF-8 encoding, Unicode is no longer escaped. Reading of legacy properties files with \u escapes is still possible
          Hide
          Uwe Schindler added a comment -

          Committed to trunk, 4.x and 4.4

          Show
          Uwe Schindler added a comment - Committed to trunk, 4.x and 4.4
          Hide
          Steve Rowe added a comment -

          Bulk close resolved 4.4 issues

          Show
          Steve Rowe added a comment - Bulk close resolved 4.4 issues

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development