Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6994

TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.14.0
    • None
    • Execution - Data Types
    • None

    Description

      A timestamp type column in a parquet file created from Spark is treated as VARBINARY by Drill 1.14.0., Trying to cast DOB column to DATE results in an Exception, although the monthOfYear field is in the allowed range.

      Data used in the test

      [test@md123 spark_data]# cat inferSchema_example.csv
      Name,Department,years_of_experience,DOB
      Sam,Software,5,1990-10-10
      Alex,Data Analytics,3,1992-10-10
      

      Create the parquet file using the above CSV file

      [test@md123 bin]# ./spark-shell
      19/01/22 21:21:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      Spark context Web UI available at http://md123.qa.lab:4040
      Spark context available as 'sc' (master = local[*], app id = local-1548192099796).
      Spark session available as 'spark'.
      Welcome to
       ____ __
       / __/__ ___ _____/ /__
       _\ \/ _ \/ _ `/ __/ '_/
       /___/ .__/\_,_/_/ /_/\_\ version 2.3.1-mapr-SNAPSHOT
       /_/
      
      Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
      Type in expressions to have them evaluated.
      Type :help for more information.
      
      scala> import org.apache.spark.sql.\{DataFrame, SQLContext}
      import org.apache.spark.sql.\{DataFrame, SQLContext}
      
      scala> import org.apache.spark.\{SparkConf, SparkContext}
      import org.apache.spark.\{SparkConf, SparkContext}
      
      scala> val sqlContext: SQLContext = new SQLContext(sc)
      warning: there was one deprecation warning; re-run with -deprecation for details
      sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@2e0163cb
      
      scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/apps/inferSchema_example.csv")
      df: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 2 more fields]
      
      scala> df.printSchema
      test
       |-- Name: string (nullable = true)
       |-- Department: string (nullable = true)
       |-- years_of_experience: integer (nullable = true)
       |-- DOB: timestamp (nullable = true)
      
      scala> df.write.parquet("/apps/infer_schema_example.parquet")
      
      // Read the parquet file
      scala> val data = sqlContext.read.parquet("/apps/infer_schema_example.parquet")
      data: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 2 more fields]
      
      // Print the schema of the parquet file from Spark
      scala> data.printSchema
      test
       |-- Name: string (nullable = true)
       |-- Department: string (nullable = true)
       |-- years_of_experience: integer (nullable = true)
       |-- DOB: timestamp (nullable = true)
      
      // Display the contents of parquet file on spark-shell
      // register temp table and do a show on all records,to display.
      
      scala> data.registerTempTable("employee")
      warning: there was one deprecation warning; re-run with -deprecation for details
      
      scala> val allrecords = sqlContext.sql("SELeCT * FROM employee")
      allrecords: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 2 more fields]
      
      scala> allrecords.show()
      +----+--------------+-------------------+-------------------+
      |Name| Department|years_of_experience| DOB|
      +----+--------------+-------------------+-------------------+
      | Sam| Software| 5|1990-10-10 00:00:00|
      |Alex|Data Analytics| 3|1992-10-10 00:00:00|
      +----+--------------+-------------------+-------------------+
      

      Querying the parquet file from Drill 1.14.0-mapr, results in the DOB column (timestamp type in Spark) being treated as VARBINARY.

      apache drill 1.14.0-mapr
      "a little sql for your nosql"
      0: jdbc:drill:schema=dfs.tmp> select * from dfs.`/apps/infer_schema_example.parquet`;
      +-------+-----------------+----------------------+--------------+
      | Name | Department | years_of_experience | DOB |
      +-------+-----------------+----------------------+--------------+
      | Sam | Software | 5 | [B@2bef51f2 |
      | Alex | Data Analytics | 3 | [B@650eab8 |
      +-------+-----------------+----------------------+--------------+
      2 rows selected (0.229 seconds)
      
      // typeof(DOB) column returns a VARBINARY type, whereas the parquet schema in Spark for DOB: timestamp (nullable = true)
      
      0: jdbc:drill:schema=dfs.tmp> select typeof(DOB) from dfs.`/apps/infer_schema_example.parquet`;
      +------------+
      | EXPR$0 |
      +------------+
      | VARBINARY |
      | VARBINARY |
      +------------+
      2 rows selected (0.199 seconds)
      

      // CAST to DATE type results in Exception, though the monthOfYear is in the range [1,12]

      0: jdbc:drill:schema=dfs.tmp> select cast(DOB as DATE) from dfs.`/apps/infer_schema_example.parquet`;
      Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must be in the range [1,12]
      
      Fragment 0:0
      
      [Error Id: 536c67d8-77c4-4b36-8aec-743344141d31 on md123.qa.lab:31010] (state=,code=0)
      

      Stack trace from drillbit.log

      2019-01-22 22:13:27,334 [23b86a78-64fc-5873-87b5-7e95d9740e51:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must be in the range [1,12]
      
      Fragment 0:0
      
      [Error Id: 536c67d8-77c4-4b36-8aec-743344141d31 on md123.qa.lab:31010]
      org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must be in the range [1,12]
      
      Fragment 0:0
      
      [Error Id: 536c67d8-77c4-4b36-8aec-743344141d31 on md123.qa.lab:31010]
       at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.14.0-mapr.jar:1.14.0-mapr]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
       at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
      Caused by: org.joda.time.IllegalFieldValueException: Value 0 for monthOfYear must be in the range [1,12]
       at org.joda.time.field.FieldUtils.verifyValueBounds(FieldUtils.java:252) ~[drill-hive-exec-shaded-1.14.0-mapr.jar:1.14.0-mapr]
       at org.joda.time.chrono.BasicChronology.getDateMidnightMillis(BasicChronology.java:612) ~[drill-hive-exec-shaded-1.14.0-mapr.jar:1.14.0-mapr]
       at org.joda.time.chrono.BasicChronology.getDateTimeMillis(BasicChronology.java:159) ~[drill-hive-exec-shaded-1.14.0-mapr.jar:1.14.0-mapr]
       at org.joda.time.chrono.AssembledChronology.getDateTimeMillis(AssembledChronology.java:120) ~[drill-hive-exec-shaded-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate(StringFunctionHelpers.java:210) ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.test.generated.ProjectorGen977.doEval(ProjectorTemplate.java:41) ~[na:na]
       at org.apache.drill.exec.test.generated.ProjectorGen977.projectRecords(ProjectorTemplate.java:67) ~[na:na]
       at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:231) ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:117) ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:142) ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.physical.impl.BasetestExec.next(BasetestExec.java:103) ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.physical.impl.ScreenCreator$Screentest.innerNext(ScreenCreator.java:83) ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.physical.impl.BasetestExec.next(BasetestExec.java:93) ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:294) ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:281) ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_181]
       at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_181]
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) ~[hadoop-common-2.7.0-mapr-1808.jar:na]
       at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:281) [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
       ... 4 common frames omitted
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            khfaraaz Khurram Faraaz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: