Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4824

Join should use `Iterator` rather than `Iterable`

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Spark Core
    • None

    Description

      In Scala, `map` and `flatMap` of `Iterable` will copy the contents of `Iterable` to a new `Seq`. Such as,

        val iterable = Seq(1, 2, 3).map(v => {
          println(v)
          v
        })
        println("Iterable map done")
      
        val iterator = Seq(1, 2, 3).iterator.map(v => {
          println(v)
          v
        })
        println("Iterator map done")
      

      outputed

      1
      2
      3
      Iterable map done
      Iterator map done
      

      So we should use 'iterator' to reduce memory consumed by join.

      Found by johannes.simon

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zsxwing Shixiong Zhu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: