Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Duplicate
-
4.7.0
-
None
-
None
Description
We are using below versions of Phoenix, HBase and Spark.
Phoenix - 4.7
HBase - 2.6.5
Spark - 2.4
Created a phoenix table by mentioning one of the field datatype as DATE and TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below.
CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL (
"CID" INTEGER,
"CDATE" DATE,
"CTIMESTAMP" TIMESTAMP,
CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("CID"));
Upserted records using upsert command and below is the data in table.
CID | CDATE | CTIMESTAMP |
1 | 2021-11-21 | 2022-01-18 18:30:33.896 |
2 | 2021-11-18 | 2022-01-18 18:45:59.336 |
3 | 2021-11-17 | 2022-01-18 19:01:04.265 |
Now, reading data from above created table in pyspark shell. We have set spark.sql.session.timeZone=UTC to spark while launching pyspark shell. Also, we have set phoenix.query.dateFormatTimeZone=UTC in hbase-site.xml file. **
Below code snippet read data from phoenix via JDBC and it read DATE datatype field as one day less.
>>> val df = spark.read.format("jdbc") .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver") .option("url", "jdbc:phoenix:localhost:2181:/hbase-secure") .option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM NS_TEST.CUSTOMER_TBL) q") .load()
>>>df.printSchema()
root
|-- CID: integer (nullable = true)
|-- CDATE: date (nullable = true)
|-- CTIMESTAMP: timestamp (nullable = true)
>>>df.select('').show(truncate=False){*}
CID | CDATE | CTIMESTAMP |
1 | 2021-11-20 | 2022-01-18 18:30:33.896 |
2 | 2021-11-17 | 2022-01-18 18:45:59.336 |
3 | 2021-11-16 | 2022-01-18 19:01:04.265 |
We have also tried using phoenix data source instead of JDBC and below is the code snippet. It also read DATE datatype field as one day less.
val df2 = spark.read.format("org.apache.phoenix.spark") .option("table", "NS_TEST.CUSTOMER_TBL") .option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure") .load()
**
>>>df.printSchema()
root
|-- CID: integer (nullable = true)
|-- CDATE: date (nullable = true)
|-- CTIMESTAMP: timestamp (nullable = true)
>>>df.select('').show(truncate=False){*}
CID | CDATE | CTIMESTAMP |
1 | 2021-11-20 | 2022-01-18 18:30:33.896 |
2 | 2021-11-17 | 2022-01-18 18:45:59.336 |
3 | 2021-11-16 | 2022-01-18 19:01:04.265 |
Please help us on this issue why Phoenix Spark reading DATE datatype field value as one day less.
Attachments
Attachments
Issue Links
- duplicates
-
PHOENIX-5066 The TimeZone is incorrectly used during writing or reading data
- Resolved
- is depended upon by
-
PHOENIX-6882 Umbrella Ticket for date/time handling issues
- Open