Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
`binary_replace_slice` can give in invalid output when used with string types. Given that there is `utf8_replace_slice`, I think `binary_replace_slice` should not support string types.
If a user actually wants to play with bytes for string type, they should explicitly cast to binary type and use `binary_replace_slice`.
>>> pc.binary_replace_slice(["hé"], 1, 2, "x") <pyarrow.lib.StringArray object at 0x7fdbc09937c0> [ "hx�" ] >>> pc.binary_replace_slice(["hé"], 1, 2, "x").validate(full=True) Traceback (most recent call last): ... ArrowInvalid: Invalid UTF8 sequence at string index 0
Ref: https://github.com/apache/arrow/pull/14550#discussion_r1021545816
cc: apitrou