XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Connect, PySpark
    • None

    Description

              data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, value=None)]
      
              # +---+-----+
              # | id|value|
              # +---+-----+
              # |  1|  NaN|
              # |  2| 42.0|
              # |  3| null|
              # +---+-----+
      
              cdf = self.connect.createDataFrame(data)
              sdf = self.spark.createDataFrame(data)
      
              print()
              print()
              print(cdf._show_string(100, 100, False))
              print()
              print(cdf.schema)
              print()
              print(sdf._jdf.showString(100, 100, False))
              print()
              print(sdf.schema)
      
              self.compare_by_show(cdf, sdf)
      
      +---+-----+
      | id|value|
      +---+-----+
      |  1| null|
      |  2| 42.0|
      |  3| null|
      +---+-----+
      
      
      StructType([StructField('id', LongType(), True), StructField('value', DoubleType(), True)])
      
      +---+-----+
      | id|value|
      +---+-----+
      |  1|  NaN|
      |  2| 42.0|
      |  3| null|
      +---+-----+
      
      
      StructType([StructField('id', LongType(), True), StructField('value', DoubleType(), True)])
      
      

      this issue is due to that `createDataFrame` can't handle None/NaN properly:

      1, in the conversion from local data to pd.DataFrame, it automatically converts both None and NaN to NaN
      2, then in the conversion from pd.DataFrame to pa.Table, it always converts NaN to null

      Attachments

        Activity

          People

            podongfeng Ruifeng Zheng
            podongfeng Ruifeng Zheng
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: