Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1839

PySpark take() does not launch a Spark job when it has to

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.1.0
    • PySpark
    • None

    Description

      If you call take() or first() on a large FilteredRDD, the driver attempts to scan all partitions to find the first valid item. If the RDD is large this would fail or hang.

      Attachments

        Activity

          People

            ilikerps Aaron Davidson
            falaki Hossein Falaki
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: