Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41279 Feature parity: DataFrame API in Spark Connect
  3. SPARK-41899

DataFrame.createDataFrame converting int to bigint

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Connect
    • None

    Description

      dt = datetime.date(2021, 12, 27)
      
      # Note; number var in Python gets converted to LongType column;
      # this is not supported by the function, so cast to Integer explicitly
      df = self.spark.createDataFrame([Row(date=dt, add=2)], "date date, add integer")
      
      self.assertTrue(
          all(
              df.select(
                  date_add(df.date, df.add) == datetime.date(2021, 12, 29),
                  date_add(df.date, "add") == datetime.date(2021, 12, 29),
                  date_add(df.date, 3) == datetime.date(2021, 12, 30),
              ).first()
          )
      )
      Traceback (most recent call last):
        File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", line 391, in test_date_add_function
          ).first()
        File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 246, in first
          return self.head()
        File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 310, in head
          rs = self.head(1)
        File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 312, in head
          return self.take(n)
        File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 317, in take
          return self.limit(num).collect()
        File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 1076, in collect
          table = self._session.client.to_table(query)
        File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 414, in to_table
          table, _ = self._execute_and_fetch(req)
        File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 586, in _execute_and_fetch
          self._handle_error(rpc_error)
        File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 625, in _handle_error
          raise SparkConnectAnalysisException(
      pyspark.sql.connect.client.SparkConnectAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "date_add(date, add)" due to data type mismatch: Parameter 2 requires the ("INT" or "SMALLINT" or "TINYINT") type, however "add" has the type "BIGINT".
      Plan: 'GlobalLimit 1
      +- 'LocalLimit 1
         +- 'Project [unresolvedalias('`==`(date_add(date#753, add#754L), 2021-12-29), None), unresolvedalias('`==`(date_add(date#753, add#754L), 2021-12-29), None), (date_add(date#753, 3) = 2021-12-30) AS (date_add(date, 3) = DATE '2021-12-30')#759]
            +- Project [date#753, add#754L]
               +- Project [date#749 AS date#753, add#750L AS add#754L]
                  +- LocalRelation [date#749, add#750L]

      Attachments

        Activity

          People

            podongfeng Ruifeng Zheng
            techaddict Sandeep Singh
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: