Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37657

Support str and timestamp for (Series|DataFrame).describe()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • PySpark
    • None

    Description

      Initialized in Koalas issue: https://github.com/databricks/koalas/issues/1888

       

      The `(Series|DataFrame).describe()` in pandas API on Spark doesn't work properly when DataFrame has no numeric column.

       

       

      >>> df = ps.DataFrame({'a': ["a", "b", "c"]})
      >>> df.describe()
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/.../python/pyspark/pandas/frame.py", line 7582, in describe
          raise ValueError("Cannot describe a DataFrame without columns")
      ValueError: Cannot describe a DataFrame without columns 
      

       

      As it works fine in pandas, we should fix it.

      Attachments

        Activity

          People

            itholic Haejoon Lee
            itholic Haejoon Lee
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: