Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2194

[Python] Pandas columns metadata incorrect for empty string columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Not A Problem
    • 0.8.0
    • 0.9.0
    • Python
    • None

    Description

      The pandas_type for bytes or unicode columns of an empty pandas DataFrame is unexpectedly float64

       

      import numpy as np
      import pandas as pd
      import pyarrow as pa
      import json
      
      empty_df = pd.DataFrame({'unicode': np.array([], dtype=np.unicode_), 'bytes': np.array([], dtype=np.bytes_)})
      empty_table = pa.Table.from_pandas(empty_df)
      json.loads(empty_table.schema.metadata[b'pandas'])['columns']
      
      # Same behavior for input dtype np.unicode_
      [{u'field_name': u'bytes',
      u'metadata': None,
      u'name': u'bytes',
      u'numpy_type': u'object',
      u'pandas_type': u'float64'},
      {u'field_name': u'unicode',
      u'metadata': None,
      u'name': u'unicode',
      u'numpy_type': u'object',
      u'pandas_type': u'float64'},
      {u'field_name': u'__index_level_0__',
      u'metadata': None,
      u'name': None,
      u'numpy_type': u'int64',
      u'pandas_type': u'int64'}]

       

      Tested on Debian 8 with python2.7 and python 3.6.4

      Attachments

        Activity

          People

            Unassigned Unassigned
            fjetter Florian Jetter
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: