[SPARK-21926] Compatibility between ML Transformers and Structured Streaming - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.3.0
Component/s: ML, Structured Streaming
Labels:
None

Description

We've run into a few cases where ML components don't play nice with streaming dataframes (for prediction). This ticket is meant to help aggregate these known cases in one place and provide a place to discuss possible fixes.

Failing cases:
1) VectorAssembler where one of the inputs is a VectorUDT column with no metadata.
Possible fixes:
More details here ~~SPARK-22346~~.

2) OneHotEncoder where the input is a column with no metadata.
Possible fixes:
a) Make OneHotEncoder an estimator (~~SPARK-13030~~).
~~b) Allow user to set the cardinality of OneHotEncoder.~~

Attachments

Issue Links

contains

SPARK-22888 OneVsRestModel does not work with Structured Streaming

Resolved

SPARK-24465 LSHModel should support Structured Streaming for transform

Resolved

SPARK-22644 Make ML testsuite support StructuredStreaming test

Resolved

is related to

SPARK-23037 RFormula should not use deprecated OneHotEncoder and should include VectorSizeHint in pipeline

Resolved

SPARK-22346 Update VectorAssembler to work with Structured Streaming

Resolved

SPARK-21748 Migrate the implementation of HashingTF from MLlib to ML

Resolved

relates to

SPARK-19141 VectorAssembler metadata causing memory issues

Resolved

SPARK-13030 Change OneHotEncoder to Estimator

Resolved

SPARK-22735 Add VectorSizeHint to ML features documentation

Resolved

SPARK-23048 Update mllib docs to replace OneHotEncoder with OneHotEncoderEstimator

Resolved

(1 is related to, 4 relates to)

Activity

People

Assignee:: Unassigned

Reporter:: Bago Amirbekian

Votes:: 3 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 05/Sep/17 21:08

Updated:: 22/Jun/18 18:41

Resolved:: 22/Jun/18 18:41