Details
-
Umbrella
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
Investigate improvements to Spark ML feature hashing (see e.g. http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.FeatureHasher.html#sklearn.feature_extraction.FeatureHasher).
Attachments
Issue Links
- contains
-
SPARK-21748 Migrate the implementation of HashingTF from MLlib to ML
- Resolved
-
SPARK-21481 Add indexOf method in ml.feature.HashingTF similar to mllib.feature.HashingTF
- Resolved
- relates to
-
SPARK-22801 Allow FeatureHasher to specify numeric columns to treat as categorical
- Resolved
1.
|
Add binary toggle Param to ml.HashingTF | Resolved | Bryan Cutler | |
2.
|
Use MurmurHash3 for hashing String features | Closed | Yanbo Liang | |
3.
|
Extend input format that feature hashing can handle | Resolved | Nicholas Pentreath | |
4.
|
HashingTF should extend UnaryTransformer | Resolved | Unassigned | |
5.
|
HashingTF should use MurmurHash3 | Resolved | Yanbo Liang | |
6.
|
Add binary toggle Param to PySpark HashingTF in ML & MLlib | Resolved | Yong Tang | |
7.
|
PySpark HashingTF hashAlgorithm param + docs | Resolved | Unassigned |