Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24671

DataFrame length using a dunder/magic method in PySpark

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 2.3.1
    • None
    • PySpark
    • None

    Description

      In Python, if a class implements a method called _len_, one can use the builtin `len` function to get a length of an instance of said class, whatever that means in its context. This is e.g. how you get the number of rows of a pandas DataFrame.

      It should be straightforward to add this functionality to PySpark, because df.count() is already implemented, so the patch I'm proposing is just two lines of code (and two lines of tests). It's in this commit, I'll submit a PR shortly.

      https://github.com/kokes/spark/commit/4d0afaf3cd046b11e8bae43dc00ddf4b1eb97732

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ondrej Ondrej Kokes
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: