Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8498

[Python] Schema.from_pandas fails on extension type, while Table.from_pandas works

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.17.0
    • Component/s: Python
    • Labels:
      None

      Description

      While Table.from_pandas() seems to work as expected with extension types,
      Schema.from_pandas()  raises an ArrowTypeError:

      df = pd.DataFrame({
         "x": pd.Series([1, 2, None], dtype="Int8"),
         "y": pd.Series(["a", "b", None], dtype="category"),
         "z": pd.Series(["ab", "bc", None], dtype="string"),
      })
      print(pa.Table.from_pandas(df).schema)
      print(pa.Schema.from_pandas(df))
      

       
      Results in:

      x: int8
      y: dictionary<values=string, indices=int8, ordered=0>
      z: string
      metadata
      --------
      {b'pandas': b'{"index_columns": [{"kind": "range", "name": null, "start": 0, "'
                  b'stop": 3, "step": 1}], "column_indexes": [{"name": null, "field_'
                  b'name": null, "pandas_type": "unicode", "numpy_type": "object", "'
                  b'metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "x", "f'
                  b'ield_name": "x", "pandas_type": "int8", "numpy_type": "Int8", "m'
                  b'etadata": null}, {"name": "y", "field_name": "y", "pandas_type":'
                  b' "categorical", "numpy_type": "int8", "metadata": {"num_categori'
                  b'es": 2, "ordered": false}}, {"name": "z", "field_name": "z", "pa'
                  b'ndas_type": "unicode", "numpy_type": "string", "metadata": null}'
                  b'], "creator": {"library": "pyarrow", "version": "0.16.0"}, "pand'
                  b'as_version": "1.0.3"}'}
      
      ---------------------------------------------------------------------------
      ArrowTypeError                            Traceback (most recent call last)
      ...
      ArrowTypeError: Did not pass numpy.dtype object
      

      I'd imagine Table.from_pandas(df).schema and Schema.from_pandas(df) should result in the exact same object?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                uwe Uwe Korn
                Reporter:
                buhrmann Thomas Buhrmann
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: