Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12751

Traits generated by SharedParamsCodeGen should not be private

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 1.5.2, 1.6.0
    • None
    • MLlib
    • None

    Description

      Many Estimators and Transformers mix in traits generated by SharedParamsCodeGen. These estimators and transformers (like StringIndexer, MinMaxScaler etc) are accessible publicly while traits generated by SharedParamsCodeGen are private[ml]. From user's code it is possible to invoke methods that the traits introduce but it is illegal to use any trait explicitly. For example, you can call setInputCol(str) on StringIndexer but you are not allowed to assign StringIndexer to a variable of type HasInputCol.

      val x: HasInputCol = new StringIndexer() // Usage of HasInputCol is illegal.
      

      For example, it is impossible to create a collection of transformers that have both HasInputCol and HasOutputCol (e.g. Set[Transformer with HasInputCol with HasOutputCol]). We have to use structural typing and reflective calls like this:

      ml.Estimator[_] { val outputCol: ml.param.Param[String] }
      

      This seems easy to fix, exposing a couple of traits should not break anything. On the other hand, maybe it goes deeper than that.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              wjur Wojciech Jurczyk
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: