Details
Description
Bug Description
The `catalogString` is not detailed enough to distinguish the pyspark.ml.linalg.Vectors and pyspark.mllib.linalg.Vectors.
How to reproduce the bug
Here is an example from the official document (Python code). If I keep all other lines untouched, and only modify the Vectors import line, which means:
# from pyspark.ml.linalg import Vectors from pyspark.mllib.linalg import Vectors
Or you can directly execute the following code snippet:
from pyspark.ml.feature import MinMaxScaler # from pyspark.ml.linalg import Vectors from pyspark.mllib.linalg import Vectors dataFrame = spark.createDataFrame([ (0, Vectors.dense([1.0, 0.1, -1.0]),), (1, Vectors.dense([2.0, 1.1, 1.0]),), (2, Vectors.dense([3.0, 10.1, 3.0]),) ], ["id", "features"]) scaler = MinMaxScaler(inputCol="features", outputCol="scaledFeatures") scalerModel = scaler.fit(dataFrame)
It will raise an error:
IllegalArgumentException: 'requirement failed: Column features must be of type struct<type:tinyint,size:int,indices:array<int>,values:array<double>> but was actually struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.'
However, the actually struct and the desired struct are exactly the same string, which cannot provide useful information to the programmer. I would suggest making the catalogString distinguish pyspark.ml.linalg.Vectors and pyspark.mllib.linalg.Vectors.
Thanks!