Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2153

[C++/Python] Decimal conversion not working for exponential notation

Details

    Description

      import pyarrow as pa
      import pandas as pd
      import decimal
      
      pa.Table.from_pandas(pd.DataFrame({'a': [decimal.Decimal('1.1'), decimal.Decimal('2E+1')]}))
      

       

      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "pyarrow/table.pxi", line 875, in pyarrow.lib.Table.from_pandas (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:44927)
        File "/home/skadlec/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 350, in dataframe_to_arrays
          convert_types)]
        File "/home/skadlec/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 349, in <listcomp>
          for c, t in zip(columns_to_convert,
        File "/home/skadlec/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 345, in convert_column
          return pa.array(col, from_pandas=True, type=ty)
        File "pyarrow/array.pxi", line 170, in pyarrow.lib.array (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:29224)
        File "pyarrow/array.pxi", line 70, in pyarrow.lib._ndarray_to_array (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:28465)
        File "pyarrow/error.pxi", line 77, in pyarrow.lib.check_status (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:8270)
      pyarrow.lib.ArrowInvalid: Expected base ten digit or decimal point but found 'E' instead.
      

      In manual cases clearly we can write decimal.Decimal('20') instead of decimal.Decimal('2E+1') but during arithmetical operations inside an application the exponential notation can be produced out of control (it is actually the normalized form of the decimal number) plus for some values the exponential notation is the only form expressing the significance so this should be accepted.

      The documentation suggests using following transformation but that's only possible when the significance information doesn't need to be kept:

      def remove_exponent(d):
          return d.quantize(Decimal(1)) if d == d.to_integral() else d.normalize()
      

      Attachments

        Issue Links

          Activity

            People

              cpcloud Phillip Cloud
              antonymayi Antony Mayi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment