Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4688

String functions may produce incorrect result when input has multi-byte character

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 1.7.0
    • None
    • None

    Description

      As discussed in DRILL-4573, the patch of DRILL-4573 would cause regression of query correctness, when the input, encoded as utf-8, contains multi-byte characters.

      For example,

      select regexp_matches('München', 'München') res3 from (values(1));
      +--------+
      | res3 |
      +--------+
      | false |
      +--------+
      

      Here is the result before the patch of DRILL-4573.

      select regexp_matches('München', 'München') res3 from (values(1));
      +-------+
      | res3 |
      +-------+
      | true |
      +-------+
      

      Once this issue has been resolved, QA would add functional testcases to cover the case of multip-byte characters, so that we will be able to catch such regression in the first place in the future.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jni Jinfeng Ni
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: