Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.0
-
None
-
linux
Description
I have a big pandas dataframe. I try and convert that to a pyarrow table and it fails with a conversion error. Not sure if this is a bug or is expected?
I realize the code below showing the error is pretty useless as is. What can I do to help identify the cause in my pandas dataframe?
Here's the error:
In [17]: pa.Table.from_pandas(df) --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) <ipython-input-17-6eac5d0eec08> in <module>() ----> 1 pa.Table.from_pandas(df) table.pxi in pyarrow.lib.Table.from_pandas() ~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads) 375 arrays = list(executor.map(convert_column, 376 columns_to_convert, --> 377 convert_types)) 378 379 types = [x.type for x in arrays] ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result_iterator() 584 # Careful not to keep a reference to the popped future 585 if timeout is None: --> 586 yield fs.pop().result() 587 else: 588 yield fs.pop().result(end_time - time.time()) ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result(self, timeout) 423 raise CancelledError() 424 elif self._state == FINISHED: --> 425 return self.__get_result() 426 427 self._condition.wait(timeout) ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in __get_result(self) 382 def __get_result(self): 383 if self._exception: --> 384 raise self._exception 385 else: 386 return self._result ~/anaconda3/lib/python3.6/concurrent/futures/thread.py in run(self) 54 55 try: ---> 56 result = self.fn(*self.args, **self.kwargs) 57 except BaseException as exc: 58 self.future.set_exception(exc) ~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py in convert_column(col, ty) 364 365 def convert_column(col, ty): --> 366 return pa.array(col, from_pandas=True, type=ty) 367 368 if nthreads == 1: array.pxi in pyarrow.lib.array() error.pxi in pyarrow.lib.check_status() error.pxi in pyarrow.lib.check_status() ArrowInvalid: Error converting from Python objects to Double: Got Python object of type str but can only handle these types: float In [18]: pa.__version__ Out[18]: '0.9.0' In [19]: pd.__version__ Out[19]: '0.23.3'