Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5665

[Python] ArrowInvalid on converting Pandas Series with dtype float64

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Not A Bug
    • None
    • None
    • Python
    • None

    Description

      ('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64')

      We are experiencing a lot of random errors (will run the same code and not get the error at all) when converting Pandas Dataframe to parquet files using pyarrow.

      We use this line of code for the convertion:

      dataframe.to_parquet(filePath, compression="snappy", index=False)

      Note: `filePath` is an AWS S3 URI.

      ArrowInvalid: ('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64')
       File "store_manager.py", line 25, in _write_files_and_partitions
       dataframe.to_parquet(filePath, compression="snappy", index=False)
       File "pandas/core/frame.py", line 2203, in to_parquet
       partition_cols=partition_cols, **kwargs)
       File "pandas/io/parquet.py", line 252, in to_parquet
       partition_cols=partition_cols, **kwargs)
       File "pandas/io/parquet.py", line 113, in write
       table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
       File "pyarrow/table.pxi", line 1139, in pyarrow.lib.Table.from_pandas
       names, arrays, metadata = dataframe_to_arrays(
       File "pyarrow/pandas_compat.py", line 474, in dataframe_to_arrays
       convert_types))
       File "concurrent/futures/_base.py", line 586, in result_iterator
       yield fs.pop().result()
       File "concurrent/futures/_base.py", line 425, in result
       return self.__get_result()
       File "concurrent/futures/_base.py", line 384, in __get_result
       raise self._exception
       File "concurrent/futures/thread.py", line 57, in run
       result = self.fn(*self.args, **self.kwargs)
       File "pyarrow/pandas_compat.py", line 463, in convert_column
       raise e
       File "pyarrow/pandas_compat.py", line 457, in convert_column
       return pa.array(col, type=ty, from_pandas=True, safe=safe)
       File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
       return _sequence_to_array(obj, mask, size, type, pool, from_pandas)
       File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
       check_status(ConvertPySequence(sequence, mask, options, &out))
       File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
       raise ArrowInvalid(message)

      Attachments

        Activity

          People

            Unassigned Unassigned
            tnesztler Thibaud Nesztler
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: