Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-1390

Regular expressions with unions do not work properly with replacing and tokenizing.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.6.0
    • 2.7.0
    • Utilities
    • None

    Description

      Consider the following regular expression:

      "(ab) | (a)"

      with the following input string:

      "abracadabra"

      If you use an instance the RegularExpression class to replace any matching substrings with the empty string, the result should be the following string:

      "rcdr"

      Instead, just the last "a" in the string is replaced:

      "abracadabr"

      If you use the same RegularExpression instance to tokenize the expression, the result should be the following set of strings:

      ""
      "r"
      "c"
      "d"
      "r"
      ""

      Instead, the result is

      "abracadabr"
      ""

      I will attach a proposed patch, but I don't know this code well, so it would be great if someone could review it.

      Attachments

        1. patch.txt
          0.4 kB
          David N Bertoni

        Activity

          People

            dbertoni David N Bertoni
            dbertoni David N Bertoni
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: