[HIVE-16889] Improve Performance Of VARCHAR - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.1.1, 3.0.0
Fix Version/s: None
Component/s: Types
Labels:
None

Description

Often times, organizations use tools that create table schemas on the fly and they specify a VARCHAR column with precision. In these scenarios, performance suffers even though one could assume performance should be better since there is pre-existing knowledge about the size of the data and buffers could be more efficiently setup then in the case where no such knowledge exists.

Most of the performance seems to be caused by reading a STRING from a file into a byte buffer, checking the length of the STRING, truncating the STRING if needed, and then serializing the STRING back into bytes again.

From the code, I have identified several areas where develops left notes about later improvements.

org.apache.hadoop.hive.serde2.io.HiveVarcharWritable.enforceMaxLength(int)
org.apache.hadoop.hive.serde2.lazy.LazyHiveVarchar.init(ByteArrayRef, int, int)
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveVarchar(Object, PrimitiveObjectInspector)

Attachments

Issue Links

depends upon

HADOOP-14525 org.apache.hadoop.io.Text Truncate

Open

relates to

HIVE-19229 Automatically Convert VARCHAR to STRING

Open

Activity

People

Assignee:: Unassigned

Reporter:: David Mollitor

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Jun/17 15:11

Updated:: 30/Jul/21 21:45