Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12150

[Python] Bad type inference of mixed-precision Decimals

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 5.0.0
    • Python
    • - macOS Big Sur 11.2.1
      - python 3.8.2

    Description

      Exporting pyarrow.table that contains mixed-precision Decimals using  parquet.write_table creates a parquet that contains invalid data/values.

      In the example below the first value of data_decimal is turned from Decimal('579.11999511718795474735088646411895751953125000000000') in the pyarrow table to Decimal('-378.68971792399258172661600550482428224218070136475136') in the parquet.

       

      import pyarrow
      from decimal import Decimal
      
      values_floats = [579.119995117188, 6.40999984741211, 2.0] # floats
      decs_from_values = [Decimal(v) for v in values_floats] # Decimal
      decs_from_float = [Decimal.from_float(v) for v in values_floats]
      decs_str = [Decimal(str(v)) for v in values_floats] # Decimal
      
      data_dict = {"data_decimal": decs_from_values, # python Decimal
       "data_decimal_from_float": decs_from_float,
       "data_float":values_floats, # python floats
       "data_dec_str": decs_str}
      
      table = pyarrow.table(data=data_dict)
      print(table.to_pydict()) # before saving
      pyarrow.parquet.write_table(table, "./pyarrow_decimal.parquet") # saving
      print(pyarrow.parquet.read_table("./pyarrow_decimal.parquet").to_pydict()) # after saving
      

       

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              alfahham abdel alfahham
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m