Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7404

[C++][Gandiva] Fix utf8 char length error on Arm64

    XMLWordPrintableJSON

Details

    • Patch

    Description

      Current code checks if a UTF-8 eight-bit code unit is within 0x00~0x7F
      by "if (c >= 0)", where c is defined as "char". This checking assumes
      char is always signed, which is not true[1]. On Arm64, char is unsigned
      by default and causes some Gandiva unit tests fail.

      Fix it by casting to "signed char" explicitly.

      [1] Cited from https://en.cppreference.com/w/cpp/language/types
      The signedness of char depends on the compiler and the target platform:
      the defaults for ARM and PowerPC are typically unsigned, the defaults
      for x86 and x64 are typically signed.

      Attachments

        Issue Links

          Activity

            People

              yibocai Yibo Cai
              yibocai Yibo Cai
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m