[SPARK-29418] Mismatched indices between input and featureImportances is at best extremely confusing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.4.4
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed
Environment:

I'm on AWS but I presume this is happening everywhere.

Description

When you read in a "libsvm" file, it requires you to be one-based, so lines look like this:

37.0 1:1.0 2:2.75

But then when you finish something like RandomForestRegressor and look at feature importances, it is zero based.

model.stages[-1].featureImportances

SparseVector(144, {0: 0.0292, 1: 0.0041}

I guess you can add one to make them line up, but why force us to do that? Either accept zero-based lists on libsvm files (easiest) or have featureImportances output correctly.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: David Kravitz

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 09/Oct/19 19:26

Updated:: 25/May/21 01:54

Resolved:: 25/May/21 01:43