[HIVE-25292] to_unix_timestamp & unix_timestamp should support ENGLISH format by default - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Clients
Labels:
- pull-request-available

Description

Hei

The to_unix_timestamp function is implemented by GenericUDFToUnixTimeStamp. It uses SimpleDateFormat to parse the time of the string type.

But SimpleDateFormat does not specify the Locale parameter, that is, the default locale of the jvm machine will be used. This will cause some non-English local machines to be unable to run similar sql like :

hive> select to_unix_timestamp('16/Mar/2017:12:25:01', 'dd/MMM/yyy:HH:mm:ss');
OK
NULL
hive> select unix_timestamp('16/Mar/2017:12:25:01', 'dd/MMM/yyy:HH:mm:ss');
OK
NULL

At the same time, I found that in spark, to_unix_timestamp & unix_timestamp also use SimpleDateFormat, and spark uses Locale.US by default, but this will make it impossible to use local language syntax. For example, in the Chinese environment, I can parse this result correctly in hive,

hive> select to_unix_timestamp('16/三月/2017:12:25:01', 'dd/MMMM/yyy:HH:mm:ss');
OK
1489638301
Time taken: 0.147 seconds, Fetched: 1 row(s)
OK

But spark will return Null.

Because English dates are more common dates, I think two SimpleDateFormats are needed. The new SimpleDateFormat is initialized with the Locale.ENGLISH parameter.

Attachments

Issue Links

links to

GitHub Pull Request #2433

GitHub Pull Request #2467

Activity

People

Assignee:: shezm

Reporter:: shezm

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/Jun/21 03:45

Updated:: 21/Oct/22 07:20

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 20m