Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17123

Performing set operations that combine string and date / timestamp columns may result in generated projection code which doesn't compile

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.0
    • 2.0.2, 2.1.0
    • SQL
    • None

    Description

      The following example program causes SpecificSafeProjection code generation to produce Java code which doesn't compile:

      import org.apache.spark.sql.types._
      spark.sql("set spark.sql.codegen.fallback=false")
      val dateDF = spark.createDataFrame(sc.parallelize(Seq(Row(new java.sql.Date(0)))), StructType(StructField("value", DateType) :: Nil))
      val longDF = sc.parallelize(Seq(new java.sql.Date(0).toString)).toDF
      dateDF.union(longDF).collect()
      

      This fails at runtime with the following error:

      failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 28, Column 107: No applicable constructor/method found for actual parameters "org.apache.spark.unsafe.types.UTF8String"; candidates are: "public static java.sql.Date org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"
      /* 001 */ public java.lang.Object generate(Object[] references) {
      /* 002 */   return new SpecificSafeProjection(references);
      /* 003 */ }
      /* 004 */
      /* 005 */ class SpecificSafeProjection extends org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
      /* 006 */
      /* 007 */   private Object[] references;
      /* 008 */   private MutableRow mutableRow;
      /* 009 */   private Object[] values;
      /* 010 */   private org.apache.spark.sql.types.StructType schema;
      /* 011 */
      /* 012 */
      /* 013 */   public SpecificSafeProjection(Object[] references) {
      /* 014 */     this.references = references;
      /* 015 */     mutableRow = (MutableRow) references[references.length - 1];
      /* 016 */
      /* 017 */     this.schema = (org.apache.spark.sql.types.StructType) references[0];
      /* 018 */   }
      /* 019 */
      /* 020 */   public java.lang.Object apply(java.lang.Object _i) {
      /* 021 */     InternalRow i = (InternalRow) _i;
      /* 022 */
      /* 023 */     values = new Object[1];
      /* 024 */
      /* 025 */     boolean isNull2 = i.isNullAt(0);
      /* 026 */     UTF8String value2 = isNull2 ? null : (i.getUTF8String(0));
      /* 027 */     boolean isNull1 = isNull2;
      /* 028 */     final java.sql.Date value1 = isNull1 ? null : org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(value2);
      /* 029 */     isNull1 = value1 == null;
      /* 030 */     if (isNull1) {
      /* 031 */       values[0] = null;
      /* 032 */     } else {
      /* 033 */       values[0] = value1;
      /* 034 */     }
      /* 035 */
      /* 036 */     final org.apache.spark.sql.Row value = new org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema(values, schema);
      /* 037 */     if (false) {
      /* 038 */       mutableRow.setNullAt(0);
      /* 039 */     } else {
      /* 040 */
      /* 041 */       mutableRow.update(0, value);
      /* 042 */     }
      /* 043 */
      /* 044 */     return mutableRow;
      /* 045 */   }
      /* 046 */ }
      

      Here, the invocation of DateTimeUtils.toJavaDate is incorrect because the generated code tries to call it with a UTF8String while the method expects an int instead.

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: