Description
The attribute toDebugString is missing from the DecisionTreeClassifier and DecisionTreeClassifierModel from ML. The attribute exists on the MLLib DecisionTree model.
There's no way to check or print the model tree structure from the ML.
The basic code for it is this:
rom pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler, StringIndexer
from pyspark.ml.classification import DecisionTreeClassifier
cl = DecisionTreeClassifier(labelCol='target_idx', featuresCol='features')
pipe = Pipeline(stages=[target_index, assembler, cl])
model = pipe.fit(df_train)
- Prediction and model evaluation
predictions = model.transform(df_test)
mc_evaluator = MulticlassClassificationEvaluator(
labelCol="target_idx", predictionCol="prediction", metricName="precision" )
accuracy = mc_evaluator.evaluate(predictions)
print("Test Error = {}".format(1.0 - accuracy))
now it would be great to be able to do what is being done on the MLLib model:
print model.toDebugString(), # it already has newline
DecisionTreeModel classifier of depth 1 with 3 nodes
If (feature 0 <= 0.0)
Predict: 0.0
Else (feature 0 > 0.0)
Predict: 1.0
but there's no toDebugString attribute either to the pipeline model or the DecisionTreeClassifier model:
cl.toDebugString()
Attribute Error
https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/mllib/tree.html
Attachments
Issue Links
- is part of
-
SPARK-15139 PySpark TreeEnsemble missing methods
- Resolved
- links to