[PHOENIX-4347] Spark Dataset loaded using Phoenix Spark Datasource - Timestamp filter issue - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 4.11.0
Fix Version/s: None
Component/s: None
Labels:
- phoenix
- spark-sql
Environment:

CentOS 6.5, Fedora 25

Description

Created a Phoenix table with below schema:

CREATE TABLE IF NOT EXISTS sample_table (
  id VARCHAR NOT NULL, 
  metricid VARCHAR NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  metricvalue DOUBLE, 
  CONSTRAINT st_pk PRIMARY KEY(id,metricid,timestamp)) SALT_BUCKETS = 20;

Inserted some data into this and loaded as Spark Dataset using the Phoenix spark datasource ('org.apache.phoenix.spark') options.

The Spark Dataset's schema is as given below:

root

– ID: string (nullable = true)

– METRICID: string (nullable = true)

– TIMESTAMP: timestamp (nullable = true)

– METRICVALUE: double (nullable = true)

I apply the Dataset's filter operation on Timestamp column as given below:

Dataset<Row> ds = <Derived from Phoenix>
ds = ds.filter("TIMESTAMP >= CAST('2017-10-31 00:00:00.0' AS TIMESTAMP)")

This operation throws me an exception as:

testPhoenixTimestamp(DatasetTest): org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "00" at line 1, column 145.

The generated query looks like this:

2017-11-02 15:29:31,722 INFO  [main] org.apache.phoenix.mapreduce.PhoenixInputFormat
Select Statement: SELECT "ID","METRICID","TIMESTAMP","0"."METRICVALUE" FROM SAMPLE_TABLE WHERE ( "TIMESTAMP" IS NOT NULL AND "TIMESTAMP" >= 2017-10-31 00:00:00.0)

The issue is with Timestamp filter condition, where the timestamp value is not wrapped in to_timestamp() function.

I have fixed this locally in org.apache.phoenix.spark.PhoenixRelation class compileValue() function, by checking the value's class. If it is java.sql.Timestamp then I am wrapping the value with to_timestamp() function.

Please let me know if there is another way of correctly querying Timestamp values in Phoenix through Spark's Dataset API.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Lokesh Kumar

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Nov/17 10:03

Updated:: 18/Apr/23 06:45

Resolved:: 18/Apr/23 06:45