Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-4347

Spark Dataset loaded using Phoenix Spark Datasource - Timestamp filter issue

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.11.0
    • None
    • None
    • CentOS 6.5, Fedora 25

    Description

      Created a Phoenix table with below schema:

      CREATE TABLE IF NOT EXISTS sample_table (
        id VARCHAR NOT NULL, 
        metricid VARCHAR NOT NULL,
        timestamp TIMESTAMP NOT NULL,
        metricvalue DOUBLE, 
        CONSTRAINT st_pk PRIMARY KEY(id,metricid,timestamp)) SALT_BUCKETS = 20;
      

      Inserted some data into this and loaded as Spark Dataset using the Phoenix spark datasource ('org.apache.phoenix.spark') options.

      The Spark Dataset's schema is as given below:

      root

      – ID: string (nullable = true)
      – METRICID: string (nullable = true)
      – TIMESTAMP: timestamp (nullable = true)
      – METRICVALUE: double (nullable = true)

      I apply the Dataset's filter operation on Timestamp column as given below:

      Dataset<Row> ds = <Derived from Phoenix>
      ds = ds.filter("TIMESTAMP >= CAST('2017-10-31 00:00:00.0' AS TIMESTAMP)")
      

      This operation throws me an exception as:

      testPhoenixTimestamp(DatasetTest): org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "00" at line 1, column 145.

      The generated query looks like this:

      2017-11-02 15:29:31,722 INFO  [main] org.apache.phoenix.mapreduce.PhoenixInputFormat
      Select Statement: SELECT "ID","METRICID","TIMESTAMP","0"."METRICVALUE" FROM SAMPLE_TABLE WHERE ( "TIMESTAMP" IS NOT NULL AND "TIMESTAMP" >= 2017-10-31 00:00:00.0)
      

      The issue is with Timestamp filter condition, where the timestamp value is not wrapped in to_timestamp() function.

      I have fixed this locally in org.apache.phoenix.spark.PhoenixRelation class compileValue() function, by checking the value's class. If it is java.sql.Timestamp then I am wrapping the value with to_timestamp() function.

      Please let me know if there is another way of correctly querying Timestamp values in Phoenix through Spark's Dataset API.

      Attachments

        Activity

          People

            Unassigned Unassigned
            lokeshdotp Lokesh Kumar
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: