We've run into a few cases where ML components don't play nice with streaming dataframes (for prediction). This ticket is meant to help aggregate these known cases in one place and provide a place to discuss possible fixes.
1) VectorAssembler where one of the inputs is a VectorUDT column with no metadata.
More details here
2) OneHotEncoder where the input is a column with no metadata.
a) Make OneHotEncoder an estimator (
b) Allow user to set the cardinality of OneHotEncoder.