[ARROW-2160] [C++/Python] Fix decimal precision inference - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.8.0
Fix Version/s: 0.9.0
Component/s: C++, Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/18126

Description

import pyarrow as pa
import pandas as pd
import decimal

df = pd.DataFrame({'a': [decimal.Decimal('0.1'), decimal.Decimal('0.01')]})
pa.Table.from_pandas(df)

raises:

pyarrow.lib.ArrowInvalid: Decimal type with precision 2 does not fit into precision inferred from first array element: 1

Looks arrow is inferring the highest precision for given column based on the first cell and expecting the rest fits in. I understand this is by design but from the point of view of pandas-arrow compatibility this is quite painful as pandas is more flexible (as demonstrated).

What this means is that user trying to pass pandas DataFrame with Decimal column(s) to arrow Table would always have to first:

Find the highest precision used in (each of) that column(s)
Adjust the first cell of (each of) that column(s) so that it explicitly uses the highest precision of that column(s)
Only then pass such DataFrame to Table.from_pandas()

So given this unavoidable procedure (and assuming arrow needs to be strict about the highest precision for a column) - shouldn't some similar logic be part of the Table.from_pandas() directly to make this transparent?

Attachments

Issue Links

links to

GitHub Pull Request #1618

Activity

People

Assignee:: Phillip Cloud

Reporter:: Antony Mayi

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 14/Feb/18 23:09

Updated:: 11/Jan/23 07:19

Resolved:: 01/Mar/18 22:28