Commons Lang
  1. Commons Lang
  2. LANG-658

Some Entitys like Ö are not matched properly against its ISO8859-1 representation

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: lang.text.translate.*
    • Labels:
      None

      Description

      In EntityArrays

      In
      private static final String[][] ISO8859_1_ESCAPE
      some matching is wrong, for example

       
              {"\u00D7", "Ö"}, // Ö - uppercase O, umlaut
              {"\u00D8", "×"}, // multiplication sign
      

      but this must be

       
             {"\u00D6", "Ö"}, // Ö - uppercase O, umlaut
              {"\u00D7", "×"}, // multiplication sign
      

      according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

      First look:

      u00CA is missing in the array and all following entries are matched wrong by an offset of 1.

      Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

        Issue Links

          Activity

          Michael Konietzka created issue -
          Michael Konietzka made changes -
          Field Original Value New Value
          Description In EntityArrays

          In
           private static final String[][] ISO8859_1_ESCAPE
          some matching is wrong, for example
                 
                  {"\u00D7", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D8", "×"}, // multiplication sign

          but this must be

                 {"\u00D6", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D7", "×"}, // multiplication sign

          according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
          In EntityArrays

          In
           private static final String[][] ISO8859_1_ESCAPE
          some matching is wrong, for example
                 
                  {"\u00D7", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D8", "×"}, // multiplication sign

          but this must be

                 {"\u00D6", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D7", "×"}, // multiplication sign

          according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

          First look:

          u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
          Fix Version/s 3.0 [ 12311714 ]
          Michael Konietzka made changes -
          Description In EntityArrays

          In
           private static final String[][] ISO8859_1_ESCAPE
          some matching is wrong, for example
                 
                  {"\u00D7", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D8", "×"}, // multiplication sign

          but this must be

                 {"\u00D6", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D7", "×"}, // multiplication sign

          according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

          First look:

          u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
          In EntityArrays

          In
           private static final String[][] ISO8859_1_ESCAPE
          some matching is wrong, for example
                 
                  {"\u00D7", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D8", "×"}, // multiplication sign

          but this must be

                 {"\u00D6", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D7", "×"}, // multiplication sign

          according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

          First look:

          u00CA is missing in the array and all following entries are matched wrong by an offset of 1.


          Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915
          Hide
          Sebb added a comment - - edited

          Later on, there are two instances of E5:

           
                  {"\u00E5", "ä"}, // ä - lowercase a, umlaut
                  {"\u00E5", "å"}, // å - lowercase a, ring
          

          The latter is correct, and subsequent entries seem OK.

          Show
          Sebb added a comment - - edited Later on, there are two instances of E5: {"\u00E5", "ä"}, // ä - lowercase a, umlaut {"\u00E5", "å"}, // å - lowercase a, ring The latter is correct, and subsequent entries seem OK.
          Hide
          Sebb added a comment - - edited

          Another duplicate entry:

           
                  {"\u00F1", "ñ"}, // ñ - lowercase n, tilde
                  {"\u00F3", "ò"}, // ò - lowercase o, grave accent
                  {"\u00F3", "ó"}, // ó - lowercase o, acute accent
          

          first F3 entry should be F2

          Show
          Sebb added a comment - - edited Another duplicate entry: {"\u00F1", "ñ"}, // ñ - lowercase n, tilde {"\u00F3", "ò"}, // ò - lowercase o, grave accent {"\u00F3", "ó"}, // ó - lowercase o, acute accent first F3 entry should be F2
          Sebb made changes -
          Description In EntityArrays

          In
           private static final String[][] ISO8859_1_ESCAPE
          some matching is wrong, for example
                 
                  {"\u00D7", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D8", "×"}, // multiplication sign

          but this must be

                 {"\u00D6", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D7", "×"}, // multiplication sign

          according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

          First look:

          u00CA is missing in the array and all following entries are matched wrong by an offset of 1.


          Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915
          In EntityArrays

          In
           private static final String[][] ISO8859_1_ESCAPE
          some matching is wrong, for example
                 
          {noformat}
                  {"\u00D7", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D8", "×"}, // multiplication sign
          {noformat}

          but this must be

          {noformat}
                 {"\u00D6", "Ö"}, // Ö - uppercase O, umlaut
                  {"\u00D7", "×"}, // multiplication sign
          {noformat}

          according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

          First look:

          u00CA is missing in the array and all following entries are matched wrong by an offset of 1.


          Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915
          Hide
          Sebb added a comment -

          Now hopefully fixed.

          Show
          Sebb added a comment - Now hopefully fixed.
          Sebb made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 3.0 [ 12311714 ]
          Resolution Fixed [ 1 ]
          Hide
          Sebb added a comment -

          Note: ran a check comparing the values agains the ones from lang2 Entities, and the two implementations now seem to agree

          Show
          Sebb added a comment - Note: ran a check comparing the values agains the ones from lang2 Entities, and the two implementations now seem to agree
          Michael Konietzka made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Mark Thomas made changes -
          Workflow jira [ 12526751 ] Default workflow, editable Closed status [ 12602580 ]
          Sebb made changes -
          Link This issue is duplicated by LANG-705 [ LANG-705 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Michael Konietzka
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development