Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17587

SparseVector __getitem__ should follow __getitem__ contract

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.6.2, 2.0.0
    • 2.0.2, 2.1.0
    • ML, MLlib, PySpark
    • None

    Description

      According to __getitem__ contract:

      if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), IndexError should be raised.

      This required for example for correct iteration over the structure.

      Right now it throws ValueError what results in a quite confusing behavior when attempt to iterate over a vector results in a ValueError due to unterminated iteration:

      In [1]: from pyspark.mllib.linalg import SparseVector
      
      In [2]: list(SparseVector(4, [0], [0]))
      ---------------------------------------------------------------------------
      ValueError                                Traceback (most recent call last)
      <ipython-input-2-147f3bb0a47d> in <module>()
      ----> 1 list(SparseVector(4, [0], [0]))
      
      /opt/spark-2.0/python/pyspark/mllib/linalg/__init__.py in __getitem__(self, index)
          803 
          804         if index >= self.size or index < -self.size:
      --> 805             raise ValueError("Index %d out of bounds." % index)
          806         if index < 0:
          807             index += self.size
      
      ValueError: Index 4 out of bounds.
      

      Attachments

        Activity

          People

            zero323 Maciej Szymkiewicz
            zero323 Maciej Szymkiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: