Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1957

[Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.8.0
    • 0.14.0
    • Python
    • Python 3.6.4. Mac OSX and CentOS Linux release 7.3.1611. Pandas 0.21.1 .

    Description

      The following code

      import pyarrow as pa
      import pyarrow.parquet as pq
      import pandas as pd
      
      n=3
      df = pd.DataFrame({'x': range(n)}, index=pd.DatetimeIndex(start='2017-01-01', freq='1n', periods=n))
      pq.write_table(pa.Table.from_pandas(df), '/tmp/t.parquet')

      results in:

      ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 1483228800000000001

      The desired effect is that we can save nanosecond resolution without losing precision (e.g. conversion to ms). Note that if freq='1u' is used, the code runs properly.

      Attachments

        Issue Links

          Activity

            People

              tpboudreau TP Boudreau
              jordansamuels Jordan Samuels
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: