Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32611

Querying ORC table in Spark3 using spark.sql.orc.impl=hive produces incorrect when timestamp is present in predicate

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • 3.0.0, 3.0.1
    • None
    • SQL
    • None
    • Important

    Description

      How to reproduce this behavior?

      • TZ="America/Los_Angeles" ./bin/spark-shell
      • sql("set spark.sql.hive.convertMetastoreOrc=true")
      • sql("set spark.sql.orc.impl=hive")
      • sql("create table t_spark(col timestamp) stored as orc;")
      • sql("insert into t_spark values (cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp));")
      • sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
        This will return empty results, which is incorrect.
      • sql("set spark.sql.orc.impl=native")
      • sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
        This will return 1 row, which is the expected output.

       

      The above query using (True, hive) returns correct results if pushdown filters are turned off

      • sql("set spark.sql.orc.filterPushdown=false")
      • sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
        This will return 1 row, which is the expected output.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sumeet.gajjar Sumeet
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: