Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28533

PySpark datatype casting error

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Invalid
    • Affects Version/s: 2.4.1
    • Fix Version/s: None
    • Component/s: PySpark
    • Labels:
      None

      Description

      Hello,

      I have faced an issue while casting the datatype of a column in pyspark 2.4.1.

      Say that i have the following data frame in which column B is a string which has a list or arrays, and I want to convert the column B to a Arraytype, so i have used the following code

      import ast
      from pyspark.sql.types import *
      from pyspark.sql.functions import udf, col
      
      df = spark.createDataFrame([("row1", "[[12.46575,13.78697],[10.565,11]]"),  ("row2", "[[1.2345,13.45454],[6.6868,0.234524]]")], schema=['A', 'B'])
      to_array = udf(lambda x: ast.literal_eval(x.replace('\"', '')), ArrayType(ArrayType(DoubleType())))
      df = df.withColumn('C', to_array(col('B')))
      df.show(truncate=False)

      The new column C is an ArrayType of ArrayType with elements of DoubleType. But with this code I was not able to convert the integer type value 11. This value is not part of the final output.

      A B C
      row1 [[12.46575,13.78697],[10.565,*11*]] [[12.46575, 13.78697], [10.565,]]
      row2 [[1.2345,13.45454],[6.6868,0.234524]] [[1.2345, 13.45454], [6.6868, 0.234524]]

      As you could see, the column C does not have 11. If I replace the DoubleType to FloatType same error and if I replace it with DecimalType the output is all empty.

      I am not sure whether there is a issue with my code or it is a bug.

      Hope, someone can provide some clarification on this. Thanks!!

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              roopteja RoopTeja Muppalla
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: