Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6899

[Python] to_pandas() not implemented on list<dictionary<values=string, indices=int32>

    XMLWordPrintableJSON

    Details

      Description

      Hi,

      pyarrow.Table.to_pandas() fails on an Arrow List Vector where the data vector is of type "dictionary encoded string". Here is the table schema as printed by pyarrow:

      pyarrow.Table
      encodedList: list<$data$: dictionary<values=string, indices=int32, ordered=0> not null> not null
        child 0, $data$: dictionary<values=string, indices=int32, ordered=0> not null
      metadata
      --------
      OrderedDict() 

      and the data (also attached in a file to this ticket)

      <pyarrow.lib.ChunkedArray object at 0x7f7ea6a748b8>
      [
        [
      
          -- dictionary:
            [
              "a",
              "b",
              "c",
              "d"
            ]
          -- indices:
            [
              0,
              1,
              2
            ],
      
          -- dictionary:
            [
              "a",
              "b",
              "c",
              "d"
            ]
          -- indices:
            [
              0,
              3
            ]
        ]
      ] 

      and the exception I got

      ---------------------------------------------------------------------------
      ArrowNotImplementedError                  Traceback (most recent call last)
      <ipython-input-10-5f865bc01df1> in <module>
      ----> 1 df.to_pandas()
      
      ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas()
      
      ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()
      
      ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata)
          700 
          701     _check_data_column_metadata_consistency(all_columns)
      --> 702     blocks = _table_to_blocks(options, table, categories)
          703     columns = _deserialize_column_index(table, all_columns, column_indexes)
          704 
      
      ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py in _table_to_blocks(options, block_table, categories)
          972 
          973     # Convert an arrow table to Block from the internal pandas API
      --> 974     result = pa.lib.table_to_blocks(options, block_table, categories)
          975 
          976     # Defined above
      
      ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi in pyarrow.lib.table_to_blocks()
      
      ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
      
      ArrowNotImplementedError: Not implemented type for list in DataFrameBlock: dictionary<values=string, indices=int32, ordered=0> 

      Note that the data vector itself can be loaded successfully by to_pandas.

      It'd be great if this would be addressed in the next version of pyarrow. For now, is there anything I can do on my end to bypass this unimplemented conversion?

      Thanks,

      Razvan

        Attachments

        1. encoded.arrow
          1 kB
          Razvan Chitu

          Issue Links

            Activity

              People

              • Assignee:
                wesm Wes McKinney
                Reporter:
                razvanch Razvan Chitu
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m