Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7404

[C++][Gandiva] Fix utf8 char length error on Arm64

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Patch

    Description

      Current code checks if a UTF-8 eight-bit code unit is within 0x00~0x7F
      by "if (c >= 0)", where c is defined as "char". This checking assumes
      char is always signed, which is not true[1]. On Arm64, char is unsigned
      by default and causes some Gandiva unit tests fail.

      Fix it by casting to "signed char" explicitly.

      [1] Cited from https://en.cppreference.com/w/cpp/language/types
      The signedness of char depends on the compiler and the target platform:
      the defaults for ARM and PowerPC are typically unsigned, the defaults
      for x86 and x64 are typically signed.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yibocai Yibo Cai
            yibocai Yibo Cai
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m

                Issue deployment