Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12780

Inconsistency returning value of ML python models' properties

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.1, 2.0.0
    • Component/s: ML, PySpark
    • Labels:
      None

      Description

      In spark/python/pyspark/ml/feature.py, StringIndexerModel has a property method named labels, which is different with other properties in other models.

      In StringIndexerModel:

      StringIndexerModel
          @property
          @since("1.5.0")
          def labels(self):
              """
              Ordered list of labels, corresponding to indices to be assigned.
              """
              return self._java_obj.labels
      

      In CounterVectorizerModel (as an example):

      CounterVectorizerModel
          @property
          @since("1.6.0")
          def vocabulary(self):
              """
              An array of terms in the vocabulary.
              """
              return self._call_java("vocabulary")
      

      In StringIndexerModel, the returned value of labels is not an array of labels as expected. Otherwise it is a JavaMember of py4j.

      What's more, the Pickle in Python side cannot deserialize Scala Array normally. According to my experiments, it translates Array[String] into Tuple, Array[Int] to array.array. It may bring some errors.

        Attachments

          Activity

            People

            • Assignee:
              yinxusen Xusen Yin
              Reporter:
              yinxusen Xusen Yin
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: