Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12780

Inconsistency returning value of ML python models' properties

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.6.1, 2.0.0
    • ML, PySpark
    • None

    Description

      In spark/python/pyspark/ml/feature.py, StringIndexerModel has a property method named labels, which is different with other properties in other models.

      In StringIndexerModel:

      StringIndexerModel
          @property
          @since("1.5.0")
          def labels(self):
              """
              Ordered list of labels, corresponding to indices to be assigned.
              """
              return self._java_obj.labels
      

      In CounterVectorizerModel (as an example):

      CounterVectorizerModel
          @property
          @since("1.6.0")
          def vocabulary(self):
              """
              An array of terms in the vocabulary.
              """
              return self._call_java("vocabulary")
      

      In StringIndexerModel, the returned value of labels is not an array of labels as expected. Otherwise it is a JavaMember of py4j.

      What's more, the Pickle in Python side cannot deserialize Scala Array normally. According to my experiments, it translates Array[String] into Tuple, Array[Int] to array.array. It may bring some errors.

      Attachments

        Activity

          People

            yinxusen Xusen Yin
            yinxusen Xusen Yin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: