Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7986

[Python] pa.Array.from_pandas cannot convert pandas.Series containing pyspark.ml.linalg.SparseVector

    XMLWordPrintableJSON

Details

    Description

      The code 

      import pandas as pd
      from pyspark.ml.linalg import SparseVector
      import pyarrow as pa
      
      sparse_values = {0: 0.1, 1: 1.1}
      sparse_vector = SparseVector(len(sparse_values), sparse_values)
      pds = pd.Series(sparse_vector)
      pa.array(pds)

      results in: 

      pyarrow/array.pxi:191: in pyarrow.lib.array
       ???
      pyarrow/array.pxi:78: in pyarrow.lib._ndarray_to_array
       ???
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      > ???
      E pyarrow.lib.ArrowInvalid: Could not convert (2,[0,1],[0.1,1.1]) with type SparseVector: did not recognize Python value type when inferring an Arrow data type
      pyarrow/error.pxi:85: ArrowInvalid
      

       

       

      My initial intention was to test if databricks.koala's functionality is implemented, which took me to error coming from pyarrow:

      import pandas as pd
      import databricks.koalas as ks
      from pyspark.ml.linalg import SparseVector
      
      sparse_values = {0: 0.1, 1: 1.1}
      sparse_vector = SparseVector(len(sparse_values), sparse_values)
      pds = pd.Series(sparse_vector)
      kss = ks.Series(sparse_vector)
      

      while pd.Series on the SparseVector works fine, the last line errors as: 

      databricks/koalas/typedef.py:176: in infer_pd_series_spark_type
       return from_arrow_type(pa.Array.from_pandas(s).type)
      pyarrow/array.pxi:593: in pyarrow.lib.Array.from_pandas
       ???
      pyarrow/array.pxi:191: in pyarrow.lib.array
       ???
      pyarrow/array.pxi:78: in pyarrow.lib._ndarray_to_array
       ???
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      > ???
      E pyarrow.lib.ArrowInvalid: Could not convert (2,[0,1],[0.1,1.1]) with type SparseVector: did not recognize Python value type when inferring an Arrow data type
      pyarrow/error.pxi:85: ArrowInvalid
      

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            nikilp Nikolay Petrov
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: