[HIVE-24693] Convert timestamps to zoned times without string operations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0-alpha-1
Component/s: None
Labels:
- pull-request-available

Target Version/s:

4.0.0

Description

Parquet DataWriteableWriter relias on NanoTimeUtils to convert a timestamp object into a binary value. The way in which it does this,... it calls toString() on the timestamp object, and then parses the String. This particular timestamp do not carry a timezone, so the string is something like:

2021-21-03 12:32:23.0000...

The parse code tries to parse the string assuming there is a time zone, and if not, falls-back and applies the provided "default time zone". As was noted in ~~HIVE-24353~~, if something fails to parse, it is very expensive to try to parse again. So, for each timestamp in the Parquet file, it:

Builds a string from the time stamp
Parses it (throws an exception, parses again)

There is no need to do this kind of string manipulations/parsing, it should just be using the epoch millis/seconds/time stored internal to the Timestamp object.

  // Converts Timestamp to TimestampTZ.
  public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) {
    return parse(ts.toString(), defaultTimeZone);
  }

Attachments

Issue Links

is related to

HIVE-24701 Remove String Manipulation from Date Parsing TimestampTZUtil

Resolved

HIVE-24353 performance: Refactor TimestampTZ parsing

Closed

links to

GitHub Pull Request #1938

Activity

People

Assignee:: David Mollitor

Reporter:: David Mollitor

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 27/Jan/21 18:26

Updated:: 17/Nov/22 08:49

Resolved:: 19/Feb/21 14:50

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

6h 10m