[SPARK-12806] Support SQL expressions extracting values from VectorUDT - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.6.0
Fix Version/s: None
Component/s: MLlib, SQL
Labels:
- bulk-closed

Description

Use cases exist where a specific index within a VectorUDT column of a DataFrame is required. For example, we may be interested in extracting a specific class probability from the probabilityCol of a LogisticRegression to compute losses. However, if probability is a column of df with type VectorUDT, the following code fails:

df.select("probability.0")

AnalysisException: u"Can't extract value from probability"

thrown from sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala.

VectorUDT essentially wraps a StructType, hence one would expect it to support value extraction Expressions in an analogous way.

Attachments

Issue Links

relates to

SPARK-19653 `Vector` Type Should Be A First-Class Citizen In Spark SQL

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Feynman Liang

Votes:: 5 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 13/Jan/16 15:05

Updated:: 01/Nov/19 00:40

Resolved:: 21/May/19 04:15