Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-6623

Phoenix Spark reading DATE datatype value less than one day from phoenix table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • 4.7.0
    • None
    • spark-connector
    • None

    Description

      We are using below versions of Phoenix, HBase and Spark.

      Phoenix - 4.7
      HBase - 2.6.5
      Spark - 2.4

      Created a phoenix table by mentioning one of the field datatype as DATE and TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below.

      CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL (
      "CID" INTEGER,
      "CDATE" DATE,
      "CTIMESTAMP" TIMESTAMP,
      CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("CID"));

      Upserted records using upsert command and below is the data in table.

       

      CID CDATE          CTIMESTAMP                 

       

      1    2021-11-21 2022-01-18 18:30:33.896
      2    2021-11-18 2022-01-18 18:45:59.336
      3    2021-11-17 2022-01-18 19:01:04.265

       

      Now, reading data from above created table in pyspark shell. We have set spark.sql.session.timeZone=UTC to spark while launching pyspark shell. Also, we have set phoenix.query.dateFormatTimeZone=UTC in hbase-site.xml file. ** 

      Below code snippet read data from phoenix via JDBC and it read DATE datatype field as one day less.

      >>> val df = spark.read.format("jdbc")
                .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver")
                .option("url", "jdbc:phoenix:localhost:2181:/hbase-secure")
                .option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM NS_TEST.CUSTOMER_TBL) q")
      .load()

      >>>df.printSchema()
      root
       |-- CID: integer (nullable = true)
       |-- CDATE: date (nullable = true)
       |-- CTIMESTAMP: timestamp (nullable = true)
      >>>df.select('').show(truncate=False){*}

      CID CDATE          CTIMESTAMP                   

       

      1      2021-11-20 2022-01-18 18:30:33.896
      2      2021-11-17 2022-01-18 18:45:59.336
      3      2021-11-16 2022-01-18 19:01:04.265

       

      We have also tried using phoenix data source instead of JDBC and below is the code snippet. It also read DATE datatype field as one day less.

      val df2 = spark.read.format("org.apache.phoenix.spark")
               .option("table", "NS_TEST.CUSTOMER_TBL")
               .option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure")
               .load()

      **
      >>>df.printSchema()
      root
       |-- CID: integer (nullable = true)
       |-- CDATE: date (nullable = true)
       |-- CTIMESTAMP: timestamp (nullable = true)
      >>>df.select('').show(truncate=False){*}

      CID CDATE          CTIMESTAMP                   

       

      1      2021-11-20 2022-01-18 18:30:33.896
      2      2021-11-17 2022-01-18 18:45:59.336
      3      2021-11-16 2022-01-18 19:01:04.265

       

      Please help us on this issue why Phoenix Spark reading DATE datatype field value as one day less.

      Attachments

        Issue Links

          Activity

            People

              stoty Istvan Toth
              adike Anand
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: