[PHOENIX-6623] Phoenix Spark reading DATE datatype value less than one day from phoenix table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Duplicate
Affects Version/s: 4.7.0
Fix Version/s: None
Component/s: spark-connector
Labels:
None

Description

We are using below versions of Phoenix, HBase and Spark.

Phoenix - 4.7
HBase - 2.6.5
Spark - 2.4

Created a phoenix table by mentioning one of the field datatype as DATE and TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below.

CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL (
"CID" INTEGER,
"CDATE" DATE,
"CTIMESTAMP" TIMESTAMP,
CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("CID"));

Upserted records using upsert command and below is the data in table.

CID

CDATE

CTIMESTAMP

1	2021-11-21	2022-01-18 18:30:33.896
2	2021-11-18	2022-01-18 18:45:59.336
3	2021-11-17	2022-01-18 19:01:04.265

Now, reading data from above created table in pyspark shell. We have set spark.sql.session.timeZone=UTC to spark while launching pyspark shell. Also, we have set phoenix.query.dateFormatTimeZone=UTC in hbase-site.xml file. **

Below code snippet read data from phoenix via JDBC and it read DATE datatype field as one day less.

>>> val df = spark.read.format("jdbc")
          .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver")
          .option("url", "jdbc:phoenix:localhost:2181:/hbase-secure")
          .option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM NS_TEST.CUSTOMER_TBL) q")
.load()

>>>df.printSchema()
root
|-- CID: integer (nullable = true)
|-- CDATE: date (nullable = true)
|-- CTIMESTAMP: timestamp (nullable = true)
>>>df.select('').show(truncate=False){*}

CID

CDATE

CTIMESTAMP

1	2021-11-20	2022-01-18 18:30:33.896
2	2021-11-17	2022-01-18 18:45:59.336
3	2021-11-16	2022-01-18 19:01:04.265

We have also tried using phoenix data source instead of JDBC and below is the code snippet. It also read DATE datatype field as one day less.

val df2 = spark.read.format("org.apache.phoenix.spark")
         .option("table", "NS_TEST.CUSTOMER_TBL")
         .option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure")
         .load()

**
>>>df.printSchema()
root
|-- CID: integer (nullable = true)
|-- CDATE: date (nullable = true)
|-- CTIMESTAMP: timestamp (nullable = true)
>>>df.select('').show(truncate=False){*}

CID

CDATE

CTIMESTAMP

1	2021-11-20	2022-01-18 18:30:33.896
2	2021-11-17	2022-01-18 18:45:59.336
3	2021-11-16	2022-01-18 19:01:04.265

Please help us on this issue why Phoenix Spark reading DATE datatype field value as one day less.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2023-07-27-18-08-32-282.png
27/Jul/23 10:08
27 kB
liu
image-2023-07-27-18-10-11-670.png
27/Jul/23 10:10
31 kB
liu
image-2023-07-27-18-10-42-609.png
27/Jul/23 10:10
32 kB
liu
image-2023-07-27-18-11-47-404.png
27/Jul/23 10:11
22 kB
liu
image-2023-07-27-19-33-50-780.png
27/Jul/23 11:33
71 kB
liu

Issue Links

duplicates

PHOENIX-5066 The TimeZone is incorrectly used during writing or reading data

Resolved

is depended upon by

PHOENIX-6882 Umbrella Ticket for date/time handling issues

Open

Activity

People

Assignee:: Istvan Toth

Reporter:: Anand

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 19/Jan/22 17:35

Updated:: 21/Feb/24 05:45

Resolved:: 23/Mar/23 07:55