Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7511

PySpark ML seed Param should be varied per class

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: ML, PySpark
    • Labels:
      None
    • Target Version/s:

      Description

      Currently, Scala's HasSeed mix-in uses a random Long as the default value for seed. Python uses 42. After discussions, we've decided to use a seed which varies based on the class name, but which is fixed instead of random. This will make behavior reproducible, rather than random, by default. Users will still be able to change the random seed.

      The default seed should be produced via some hash of the class name.

      Scala's seed will be fixed in a separate patch.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                holdenk_amp Holden Karau
                Reporter:
                josephkb Joseph K. Bradley
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: