Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7404

[C++][Gandiva] Fix utf8 char length error on Arm64

    XMLWordPrintableJSON

    Details

    • Flags:
      Patch

      Description

      Current code checks if a UTF-8 eight-bit code unit is within 0x00~0x7F
      by "if (c >= 0)", where c is defined as "char". This checking assumes
      char is always signed, which is not true[1]. On Arm64, char is unsigned
      by default and causes some Gandiva unit tests fail.

      Fix it by casting to "signed char" explicitly.

      [1] Cited from https://en.cppreference.com/w/cpp/language/types
      The signedness of char depends on the compiler and the target platform:
      the defaults for ARM and PowerPC are typically unsigned, the defaults
      for x86 and x64 are typically signed.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yibo Yibo Cai
                Reporter:
                yibo Yibo Cai
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m