Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5270

Provide isEmpty() function in RDD API

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 1.2.0
    • 1.3.0
    • None
    • None
    • Centos 6

    Description

      Right now there is no clean way to check if an RDD is empty. As discussed here: http://apache-spark-user-list.1001560.n3.nabble.com/Testing-if-an-RDD-is-empty-td1678.html#a1679

      I'd like a method rdd.isEmpty that returns a boolean.

      This would be especially useful when using streams. Sometimes my batches are huge in one stream, sometimes I get nothing for hours. Still I have to run count() to check if there is anything in the RDD. I can process my empty RDD like the others but it would be more efficient to just skip the empty ones.

      I can also run first() and catch the exception; this is neither a clean nor fast solution.

      Attachments

        Activity

          People

            srowen Sean R. Owen
            alrocks47 Al M
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: