Description
Currently output type of Tokenizer is Array(String, false), which is not compatible with Word2Vec and Other transformers since their input type is Array(String, true). Seq[String] in udf will be treated as Array(String, true) by default.
I'm also thinking for Nullable columns, maybe tokenizer should return Array(null) for null value in the input.
Attachments
Issue Links
- is cloned by
-
SPARK-10835 Word2Vec should accept non-null string array, in addition to existing null string array
- Resolved
- links to