Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21401

add poll function for BoundedPriorityQueue

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: ML, MLlib
    • Labels:
      None

      Description

      The most of BoundedPriorityQueue usages in ML/MLLIB are:
      Get the value of BoundedPriorityQueue, then sort it.
      For example, in Word2Vec: pq.toSeq.sortBy(-_._2)
      in ALS, pq.toArray.sorted()

      The test results show using pq.poll() is much faster than sort the value.
      For example, in PR https://github.com/apache/spark/pull/18624
      We get the sorted value of pq by the following code:

      var size = pq.size
      while(size > 0) {
      size -= 1
      val factor = pq.poll
      }

      If using the generally used methods: pq.toArray.sorted() to get the sorted value of pq. There is about 10% performance reduction.

      It is good to add the poll function for BoundedPriorityQueue, since many usages of PQ need the sorted value.

        Attachments

          Activity

            People

            • Assignee:
              peng.meng@intel.com Peng Meng
              Reporter:
              peng.meng@intel.com Peng Meng
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: