Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
None
Description
There is more to this issue than meets the eye. The stringr::str_to_sentence() does 2 things:
- capitalise the first word
- if there are multiple sentences provided as a single string, attempts to find sentence breaks and capitalise the first word of each sentence.
The stringr implementation wraps stringi::str_trans_totitle(), which in turns uses ICU’s BreakIterator to locate specific text boundaries. As a consequence stringr::str_to_title() is not able to identify a full stop / period (".") as a sentence end and does not capitalise words following it. Thus, there is a discrepancy between behaviour of the utf8_capitalize kernel (which capitalises the first word of a string without making any attempt to break into sentences) and the behaviour of stringr::str_to_sentence().
For more extensive discussions around the stringi / stringr implementation see stringr issues 202 and 231.
Due to the complexity of this issue and the relatively niche use cases, the recommendation is to postpone implementation.
Attachments
Issue Links
- depends upon
-
ARROW-12944 [C++] String capitalize kernel
- Resolved
- links to