Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.4.0
-
None
-
Incompatible change, Reviewed
Description
A recent performance study showed that 2 places in Hive code has exhibited large cpu usage percentage:
1. String.getBytes() (UTF-8 encoding)
2. String.split()
We should replace String with Text object to:
1. Avoid UTF-8 decoding and encoding
2. Reuse the Text object and avoid creating new objects for each column in each row like in String.split()
This is expected to give a big (20%+) performance improvement to Hive.
Attachments
Attachments
Issue Links
- is blocked by
-
HIVE-337 LazySimpleSerDe should support multi-level nested array, map, struct types
- Closed
-
HIVE-375 LazySimpleSerDe to directly serialize (append) int/long/byte/short etc to UTF-8 buffer
- Closed
- is related to
-
HIVE-270 Add a lazy-deserialized SerDe for space and cpu efficient serialization of rows with primitive types
- Closed