[SPARK-9062] Change output type of Tokenizer to Array(String, true) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.0
Component/s: ML
Labels:
None

Description

Currently output type of Tokenizer is Array(String, false), which is not compatible with Word2Vec and Other transformers since their input type is Array(String, true). Seq[String] in udf will be treated as Array(String, true) by default.

I'm also thinking for Nullable columns, maybe tokenizer should return Array(null) for null value in the input.

Attachments

Issue Links

is cloned by

SPARK-10835 Word2Vec should accept non-null string array, in addition to existing null string array

Resolved

links to

[Github] Pull Request #7414 (hhbyyh)

Activity

People

Assignee:: yuhao yang

Reporter:: yuhao yang

Shepherd:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 15/Jul/15 06:38

Updated:: 20/Sep/16 14:45

Resolved:: 17/Jul/15 20:44