Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6872

[C++][Python] Empty table with dictionary-columns raises ArrowNotImplementedError

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.15.0
    • Fix Version/s: 1.0.0
    • Component/s: C++, Python
    • Labels:
      None

      Description

      Abstract

      As a pyarrow user, I would expect that I can create an empty table out of every schema that I created via pandas. This does not work for dictionary types (e.g. "category" dtypes).

      Test Case

      This code:

      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame({"x": pd.Series(["x", "y"], dtype="category")})
      table = pa.Table.from_pandas(df)
      schema = table.schema
      table_empty = schema.empty_table()  # boom
      

      produces this exception:

      Traceback (most recent call last):
        File "arrow_bug.py", line 8, in <module>
          table_empty = schema.empty_table()
        File "pyarrow/types.pxi", line 860, in __iter__
        File "pyarrow/array.pxi", line 211, in pyarrow.lib.array
        File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
        File "pyarrow/error.pxi", line 86, in pyarrow.lib.check_status
      pyarrow.lib.ArrowNotImplementedError: Sequence converter for type dictionary<values=string, indices=int8, ordered=0> not implemented
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              marco.neumann.by Marco Neumann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: