[SPARK-30154] PySpark UDF to convert MLlib vectors to dense arrays - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: ML, MLlib, PySpark
Labels:
None

Target Version/s:

3.0.0

Description

If a PySpark user wants to convert MLlib sparse/dense vectors in a DataFrame into dense arrays, an efficient approach is to do that in JVM. However, it requires PySpark user to write Scala code and register it as a UDF. Often this is infeasible for a pure python project.

What we can do is to predefine those converters in Scala and expose them in PySpark, e.g.:

from pyspark.ml.functions import vector_to_dense_array

df.select(vector_to_dense_array(col("features"))

cc: weichenxu123

Attachments

Issue Links

links to

GitHub Pull Request #26910

Activity

People

Assignee:: Weichen Xu

Reporter:: Xiangrui Meng

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Dec/19 17:41

Updated:: 07/Jan/20 00:19

Resolved:: 07/Jan/20 00:19