[HADOOP-17141] Add Capability To Get Text Length - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.4.0
Component/s: common, io
Labels:
None

Target Version/s:

3.4.0
Hadoop Flags:

Reviewed

Description

The Hadoop Text class contains an array of byte which contain a UTF-8 encoded string. However, there is no way to quickly get the length of that string. One can get the number of bytes in the byte array, but to figure out the length of the String, it needs to be decoded first. In this simple example, sorting the Text objects by String length, the String needs to be decoded from the byte array repeatedly. This was brought to my attention based on ~~HIVE-23870~~.

  public static void main(String[] args) {
    List<Text> list = Arrays.asList(new Text("1"), new Text("22"), new Text("333"));
    list.sort((Text t1, Text t2) -> t1.toString().length() - t2.toString().length());
  }

Also helpful if I want to check the last letter in the Text object repeatedly:

    Text t = new Text("4444");
    System.out.println(t.charAt(t.toString().length() - 1));

Attachments

Issue Links

is related to

HIVE-23870 Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable

Closed

links to

HADOOP-17141: Add Capability To Get Text Length

Activity

People

Assignee:: David Mollitor

Reporter:: David Mollitor

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Jul/20 19:59

Updated:: 11/Feb/24 06:37

Resolved:: 24/Jul/20 09:41