Lucene - Core
  1. Lucene - Core
  2. LUCENE-3983

HTMLCharacterEntities.jflex uses String.toUpperCase without Locale

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Is this expected?

            "xi", "\u03BE", "yacute", "\u00FD", "yen", "\u00A5", "yuml", "\u00FF",
            "zeta", "\u03B6", "zwj", "\u200D", "zwnj", "\u200C"
          };
          for (int i = 0 ; i < entities.length ; i += 2) {
            Character value = entities[i + 1].charAt(0);
            entityValues.put(entities[i], value);
            if (upperCaseVariantsAccepted.contains(entities[i])) {
              entityValues.put(entities[i].toUpperCase(), value);
            }
          }
      

      In my opinion, this should look like:

            "xi", "\u03BE", "yacute", "\u00FD", "yen", "\u00A5", "yuml", "\u00FF",
            "zeta", "\u03B6", "zwj", "\u200D", "zwnj", "\u200C"
          };
          for (int i = 0 ; i < entities.length ; i += 2) {
            Character value = entities[i + 1].charAt(0);
            entityValues.put(entities[i], value);
            if (upperCaseVariantsAccepted.contains(entities[i])) {
              entityValues.put(entities[i].toUpperCase(Locale.ENGLISH), value);
            }
          }
      

      (otherwise in the Turkish locale, the entities containing "i" (like "xi" -> '\u03BE') will not be detected correctly).

        Activity

        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Steve Rowe made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Lucene Fields New [ 10121 ] New,Patch Available [ 10121, 10120 ]
        Fix Version/s 4.0 [ 12314025 ]
        Resolution Fixed [ 1 ]
        Steve Rowe made changes -
        Attachment LUCENE-3983.patch [ 12528100 ]
        Steve Rowe made changes -
        Field Original Value New Value
        Priority Major [ 3 ] Minor [ 4 ]
        Uwe Schindler created issue -

          People

          • Assignee:
            Steve Rowe
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development