[HADOOP-14525] org.apache.hadoop.io.Text Truncate - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.8.1
Fix Version/s: None
Component/s: io
Labels:
None

Description

For Apache Hive, VARCHAR fields are much slower than STRING fields when a precision (string length cap) is included. Keep in mind that this precision is the number of UTF-8 characters in the string, not the number of bytes.

The general procedure is:

Load an entire byte buffer into a Text object
Convert it to a String
Count N number of character code points
Substring the String at the correct place
Convert the String back into a byte array and populate the Text object

It would be great if the Text object could offer a truncate/substring method based on character count that did not require copying data around. Along the same lines, a "getCharacterLength()" method may also be useful to determine if the precision has been exceeded.

Attachments

Issue Links

is depended upon by

HIVE-16889 Improve Performance Of VARCHAR

Open

Activity

People

Assignee:: Unassigned

Reporter:: David Mollitor

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 13/Jun/17 16:52

Updated:: 13/Jun/17 17:30