Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6607

[Python] Support for set/list columns when converting from Pandas

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 6.0.0
    • Python
    • None
    • python 3.6.7, pandas 0.24.2, pyarrow 0.14.1 on WSL in Windows 10

    Description

      Hi,

      Using python 3.6.7, pandas 0.24.2, pyarrow 0.14.1 on WSL in Windows 10...

      ```python
      import pandas as pd

      df = pd.DataFrame({'a': [1,2,3], 'b': [set([1,2]), set([2,3]), set([3,4,5])]})

      df.to_feather('test.ft')
      ```

      I get:

      ```
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/gioras/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2131, in to_feather
      to_feather(self, fname)
      File "/home/gioras/.local/lib/python3.6/site-packages/pandas/io/feather_format.py", line 83, in to_feather
      feather.write_feather(df, path)
      File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 182, in write_feather
      writer.write(df)
      File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 93, in write
      table = Table.from_pandas(df, preserve_index=False)
      File "pyarrow/table.pxi", line 1174, in pyarrow.lib.Table.from_pandas
      File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 496, in dataframe_to_arrays
      for c, f in zip(columns_to_convert, convert_fields)]
      File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 496, in <listcomp>
      for c, f in zip(columns_to_convert, convert_fields)]
      File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 487, in convert_column
      raise e
      File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 481, in convert_column
      result = pa.array(col, type=type_, from_pandas=True, safe=safe)
      File "pyarrow/array.pxi", line 191, in pyarrow.lib.array
      File "pyarrow/array.pxi", line 78, in pyarrow.lib._ndarray_to_array
      File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: ('Could not convert {1, 2} with type set: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column b with type object')
      ```

      And obviously `df.drop('b', axis=1).to_feather('test.ft')` works.

      Questions:
      (1) Is it possible to support these kind of set/list columns?
      (2) Anyone has an idea on how to deal with this? I cannot unnest these set/list columns as this would explode the DataFrame. My only other idea is to convert set `{1,2}` into a string `1,2` and parse it after reading the file. And hoping it won't be slow.

       

      Update:

      With lists column the error is different:

      ```python
      import pandas as pd

      df = pd.DataFrame({'a': [1,2,3], 'b': [[1,2], [2,3], [3,4,5]]})

      df.to_feather('test.ft')
      ```

      ```

      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/gioras/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2131, in to_feather
      to_feather(self, fname)
      File "/home/gioras/.local/lib/python3.6/site-packages/pandas/io/feather_format.py", line 83, in to_feather
      feather.write_feather(df, path)
      File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 182, in write_feather
      writer.write(df)
      File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 97, in write
      self.writer.write_array(name, col.data.chunk(0))
      File "pyarrow/feather.pxi", line 67, in pyarrow.lib.FeatherWriter.write_array
      File "pyarrow/error.pxi", line 93, in pyarrow.lib.check_status
      pyarrow.lib.ArrowNotImplementedError: list<item: int64>

      ```

      Attachments

        Activity

          People

            amol- Alessandro Molina
            gsimchoni Giora Simchoni
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: