Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2966

[Python] Data type conversion error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.0
    • 0.11.0
    • Python
    • None
    • linux

    Description

      I have a big pandas dataframe. I try and convert that to a pyarrow table and it fails with a conversion error. Not sure if this is a bug or is expected? 

      I realize the code below showing the error is pretty useless as is. What can I do to help identify the cause in my pandas dataframe?

      Here's the error:

       

      In [17]: pa.Table.from_pandas(df)
      ---------------------------------------------------------------------------
      ArrowInvalid Traceback (most recent call last)
      <ipython-input-17-6eac5d0eec08> in <module>()
      ----> 1 pa.Table.from_pandas(df)
      
      table.pxi in pyarrow.lib.Table.from_pandas()
      
      ~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads)
      375 arrays = list(executor.map(convert_column,
      376 columns_to_convert,
      --> 377 convert_types))
      378 
      379 types = [x.type for x in arrays]
      
      ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result_iterator()
      584 # Careful not to keep a reference to the popped future
      585 if timeout is None:
      --> 586 yield fs.pop().result()
      587 else:
      588 yield fs.pop().result(end_time - time.time())
      
      ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
      423 raise CancelledError()
      424 elif self._state == FINISHED:
      --> 425 return self.__get_result()
      426 
      427 self._condition.wait(timeout)
      
      ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
      382 def __get_result(self):
      383 if self._exception:
      --> 384 raise self._exception
      385 else:
      386 return self._result
      
      ~/anaconda3/lib/python3.6/concurrent/futures/thread.py in run(self)
      54 
      55 try:
      ---> 56 result = self.fn(*self.args, **self.kwargs)
      57 except BaseException as exc:
      58 self.future.set_exception(exc)
      
      ~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py in convert_column(col, ty)
      364 
      365 def convert_column(col, ty):
      --> 366 return pa.array(col, from_pandas=True, type=ty)
      367 
      368 if nthreads == 1:
      
      array.pxi in pyarrow.lib.array()
      
      error.pxi in pyarrow.lib.check_status()
      
      error.pxi in pyarrow.lib.check_status()
      
      ArrowInvalid: Error converting from Python objects to Double: Got Python object of type str but can only handle these types: float
      
      In [18]: pa.__version__
      Out[18]: '0.9.0'
      
      In [19]: pd.__version__
      Out[19]: '0.23.3'
      
      

       

      Attachments

        Activity

          People

            wesm Wes McKinney
            brooksch Christopher Brooks
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: