Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
2.3.1
-
None
-
None
Description
In Python, if a class implements a method called _len_, one can use the builtin `len` function to get a length of an instance of said class, whatever that means in its context. This is e.g. how you get the number of rows of a pandas DataFrame.
It should be straightforward to add this functionality to PySpark, because df.count() is already implemented, so the patch I'm proposing is just two lines of code (and two lines of tests). It's in this commit, I'll submit a PR shortly.
https://github.com/kokes/spark/commit/4d0afaf3cd046b11e8bae43dc00ddf4b1eb97732
Attachments
Issue Links
- contains
-
SPARK-28172 pyspark DataFrame equality operator
- Resolved
- links to