Details
-
Umbrella
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
Investigate improvements to Spark ML feature hashing (see e.g. http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.FeatureHasher.html#sklearn.feature_extraction.FeatureHasher).
Attachments
Issue Links
- contains
-
SPARK-21748 Migrate the implementation of HashingTF from MLlib to ML
-
- Resolved
-
-
SPARK-21481 Add indexOf method in ml.feature.HashingTF similar to mllib.feature.HashingTF
-
- Resolved
-
- relates to
-
SPARK-22801 Allow FeatureHasher to specify numeric columns to treat as categorical
-
- Resolved
-
1.
|
Add binary toggle Param to ml.HashingTF |
|
Resolved | Bryan Cutler |
2.
|
Use MurmurHash3 for hashing String features |
|
Closed | Yanbo Liang |
3.
|
Extend input format that feature hashing can handle |
|
Resolved | Nicholas Pentreath |
4.
|
HashingTF should extend UnaryTransformer |
|
Resolved | Unassigned |
5.
|
HashingTF should use MurmurHash3 |
|
Resolved | Yanbo Liang |
6.
|
Add binary toggle Param to PySpark HashingTF in ML & MLlib |
|
Resolved | Yong Tang |
7.
|
PySpark HashingTF hashAlgorithm param + docs |
|
Resolved | Unassigned |