Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22905

Fix ChiSqSelectorModel, GaussianMixtureModel save implementation for Row order issues

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.1
    • 2.3.0
    • MLlib
    • None

    Description

      Currently, in `ChiSqSelectorModel`, save:

      spark.createDataFrame(dataArray).repartition(1).write...
      

      The default partition number used by createDataFrame is "defaultParallelism",
      Current RoundRobinPartitioning won't guarantee the "repartition" generating the same order result with local array. We need fix it.

      Attachments

        Activity

          People

            weichenxu123 Weichen Xu
            weichenxu123 Weichen Xu
            Joseph K. Bradley Joseph K. Bradley
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified