Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.8.1
-
None
-
None
Description
For Apache Hive, VARCHAR fields are much slower than STRING fields when a precision (string length cap) is included. Keep in mind that this precision is the number of UTF-8 characters in the string, not the number of bytes.
The general procedure is:
- Load an entire byte buffer into a Text object
- Convert it to a String
- Count N number of character code points
- Substring the String at the correct place
- Convert the String back into a byte array and populate the Text object
It would be great if the Text object could offer a truncate/substring method based on character count that did not require copying data around. Along the same lines, a "getCharacterLength()" method may also be useful to determine if the precision has been exceeded.
Attachments
Issue Links
- is depended upon by
-
HIVE-16889 Improve Performance Of VARCHAR
- Open