Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24583

Wrong schema type in InsertIntoDataSourceCommand

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.2, 2.4.0
    • Component/s: SQL
    • Labels:
      None

      Description

      For a DataSource table, whose schema contains a field with "nullable=false", while user tries to insert a NULL value into this field, the input dataFrame will return an incorrect value or throw NullPointerException. And that's because, the schema nullability of the input relation has been overridden bluntly with the destination schema by the code below in InsertIntoDataSourceCommand:

        override def run(sparkSession: SparkSession): Seq[Row] = {
          val relation = logicalRelation.relation.asInstanceOf[InsertableRelation]
          val data = Dataset.ofRows(sparkSession, query)
          // Apply the schema of the existing table to the new data.
          val df = sparkSession.internalCreateDataFrame(data.queryExecution.toRdd, logicalRelation.schema)
          relation.insert(df, overwrite)
      
          // Re-cache all cached plans(including this relation itself, if it's cached) that refer to this
          // data source relation.
          sparkSession.sharedState.cacheManager.recacheByPlan(sparkSession, logicalRelation)
      
          Seq.empty[Row]
        }
      

        Attachments

          Activity

            People

            • Assignee:
              maryannxue Maryann Xue
              Reporter:
              maryannxue Maryann Xue
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: