Often in Spark ML, there are classes that use a Scala Array in a constructor. In order to add the same API to Python, a Java-friendly alternate constructor needs to exist to be compatible with py4j when converting from a list. This is because the current conversion in PySpark _py2java creates a java.util.ArrayList, as shown in this error msg
Creating an alternate constructor can be avoided by creating a py4j JavaArray using new_array. This type is compatible with the Scala Array currently used in classes like CountVectorizerModel and StringIndexerModel.
Most of the boiler-plate Python code to do this can be put in a convenience function inside of ml.JavaWrapper to give a clean way of constructing ML objects without adding special constructors.