Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
The current MultipleAdditiveTree model doesn't support missing features' values.
When a feature value is not passed, the model directly translates it to zero.
Other LTR model libraries, like xgboost, are able to differentiate missing values from other values and also from zero values. They learn how to treat missing values at training time and add an additional "missing" branch to the tree with the direction learned to be the best in that situation.
It would be nice to integrate this feature also in Solr MultipleAdditiveTree models. An additional "missing" parameter should be added to the RegressionTreeNode. This will determine the direction to take in case the feature value is missing.
This integration will allow us to differentiate between zero and missing features.
For example, if the feature is "hotel_avg_review" (with a ranking between zero and five stars), we would like to behave differently if the hotel has no reviews (we do not know if it is good) or if it has a review of zero stars (the hotel is bad).
Attachments
Issue Links
- links to