Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4967

[C++] Parquet: Object type and stats lost when using 96-bit timestamps

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 0.12.1
    • None
    • C++, Python
    • PyArrow: 0.12.1
      Python: 2.7.15, 3.7.2
      Pandas: 0.24.2

    Description

      Run the following code:

      import datetime as dt
      import pandas as pd
      import pyarrow as pa
      import pyarrow.parquet as pq
      
      dataframe = pd.DataFrame({'foo': [dt.datetime.now()]})
      table = pa.Table.from_pandas(dataframe, preserve_index=False)
      
      pq.write_table(table, 'int64.parq')
      pq.write_table(table, 'int96.parq', use_deprecated_int96_timestamps=True)
      

      Examining the int64.parq file, we see that the column metadata includes an object type of TIMESTAMP_MICROS and also gives some stats. All is well.

      file schema: schema 
      --------------------------------------------------------------------------------
      foo:         OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1
      
      row group 1: RC:1 TS:76 OFFSET:4 
      --------------------------------------------------------------------------------
      foo:          INT64 SNAPPY ... ST:[min: 2019-12-31T23:59:59.999000, max: 2019-12-31T23:59:59.999000, num_nulls: 0]
      

      However, if we look at int96.parq, it appears that that metadata is lost. No object type, and no column stats.

      file schema: schema 
      --------------------------------------------------------------------------------
      foo:         OPTIONAL INT96 R:0 D:1
      
      row group 1: RC:1 TS:58 OFFSET:4 
      --------------------------------------------------------------------------------
      foo:          INT96 SNAPPY ... ST:[no stats for this column]
      

      This is a bit confusing since the metadata for the exact same data can look differently depending on an unrelated flag being set or cleared.

      Attachments

        Activity

          People

            Unassigned Unassigned
            yiannisliodakis Diego Argueta
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: