Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21100

Add summary method as alternative to describe that gives quartiles similar to Pandas

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1.1
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      The DataFrame describe method should also include quartiles (25th, 50th, and 75th percentiles) like Pandas.

      Example pandas output:

      In [4]: df.describe()
      Out[4]:
             Unnamed: 0       displ         year         cyl         cty         hwy
      count  234.000000  234.000000   234.000000  234.000000  234.000000  234.000000
      mean   117.500000    3.471795  2003.500000    5.888889   16.858974   23.440171
      std     67.694165    1.291959     4.509646    1.611534    4.255946    5.954643
      min      1.000000    1.600000  1999.000000    4.000000    9.000000   12.000000
      25%     59.250000    2.400000  1999.000000    4.000000   14.000000   18.000000
      50%    117.500000    3.300000  2003.500000    6.000000   17.000000   24.000000
      75%    175.750000    4.600000  2008.000000    8.000000   19.000000   27.000000
      max    234.000000    7.000000  2008.000000    8.000000   35.000000   44.000000
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                a1ray Andrew Ray
                Reporter:
                a1ray Andrew Ray
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: