Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37039

np.nan series.astype(bool) should be True

    XMLWordPrintableJSON

Details

    Description

      np.nan series.astype(bool) should be True, rather than Fasle:

      https://github.com/apache/spark/blob/46bcef7472edd40c23afd9ac74cffe13c6a608ad/python/pyspark/pandas/data_type_ops/base.py#L147

      >>> pd.Series([1, 2, np.nan], dtype=float).astype(bool)
      >>> pd.Series([1, 2, np.nan], dtype=str).astype(bool)
      >>> pd.Series([datetime.date(1994, 1, 31), datetime.date(1994, 2, 1), np.nan])
      0 True
      1 True
      2 True
      dtype: bool

      But in pyspark, it is:
      0 True
      1 True
      2 False
      dtype: bool

      Attachments

        Activity

          People

            itholic Haejoon Lee
            yikunkero Yikun Jiang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: