Description
Right now there is no clean way to check if an RDD is empty. As discussed here: http://apache-spark-user-list.1001560.n3.nabble.com/Testing-if-an-RDD-is-empty-td1678.html#a1679
I'd like a method rdd.isEmpty that returns a boolean.
This would be especially useful when using streams. Sometimes my batches are huge in one stream, sometimes I get nothing for hours. Still I have to run count() to check if there is anything in the RDD. I can process my empty RDD like the others but it would be more efficient to just skip the empty ones.
I can also run first() and catch the exception; this is neither a clean nor fast solution.
Attachments
Issue Links
- links to