Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2194

[Python] Pandas columns metadata incorrect for empty string columns

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.9.0
    • Component/s: Python
    • Labels:
      None

      Description

      The pandas_type for bytes or unicode columns of an empty pandas DataFrame is unexpectedly float64

       

      import numpy as np
      import pandas as pd
      import pyarrow as pa
      import json
      
      empty_df = pd.DataFrame({'unicode': np.array([], dtype=np.unicode_), 'bytes': np.array([], dtype=np.bytes_)})
      empty_table = pa.Table.from_pandas(empty_df)
      json.loads(empty_table.schema.metadata[b'pandas'])['columns']
      
      # Same behavior for input dtype np.unicode_
      [{u'field_name': u'bytes',
      u'metadata': None,
      u'name': u'bytes',
      u'numpy_type': u'object',
      u'pandas_type': u'float64'},
      {u'field_name': u'unicode',
      u'metadata': None,
      u'name': u'unicode',
      u'numpy_type': u'object',
      u'pandas_type': u'float64'},
      {u'field_name': u'__index_level_0__',
      u'metadata': None,
      u'name': None,
      u'numpy_type': u'int64',
      u'pandas_type': u'int64'}]

       

      Tested on Debian 8 with python2.7 and python 3.6.4

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              fjetter Florian Jetter
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: