Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30154

PySpark UDF to convert MLlib vectors to dense arrays

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • ML, MLlib, PySpark
    • None

    Description

      If a PySpark user wants to convert MLlib sparse/dense vectors in a DataFrame into dense arrays, an efficient approach is to do that in JVM. However, it requires PySpark user to write Scala code and register it as a UDF. Often this is infeasible for a pure python project.

      What we can do is to predefine those converters in Scala and expose them in PySpark, e.g.:

      from pyspark.ml.functions import vector_to_dense_array
      
      df.select(vector_to_dense_array(col("features"))
      

      cc: weichenxu123

      Attachments

        Issue Links

          Activity

            People

              weichenxu123 Weichen Xu
              mengxr Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: