[SPARK-22905] Fix ChiSqSelectorModel, GaussianMixtureModel save implementation for Row order issues - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.1
Fix Version/s: 2.3.0
Component/s: MLlib
Labels:
None

Description

Currently, in `ChiSqSelectorModel`, save:

spark.createDataFrame(dataArray).repartition(1).write...

The default partition number used by createDataFrame is "defaultParallelism",
Current RoundRobinPartitioning won't guarantee the "repartition" generating the same order result with local array. We need fix it.

Attachments

Issue Links

links to

[Github] Pull Request #20088 (WeichenXu123)

[Github] Pull Request #20113 (zhengruifeng)

[Github] Pull Request #22079 (bersprockets)

[Github] Pull Request #22211 (henryr)

Activity

People

Assignee:: Weichen Xu

Reporter:: Weichen Xu

Shepherd:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 27/Dec/17 04:38

Updated:: 23/Aug/18 23:27

Resolved:: 29/Dec/17 01:33

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Not Specified