Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7350

[Python] Parquet file metadata min and max statistics not decoded from bytes for Decimal data types

    XMLWordPrintableJSON

Details

    Description

      Parquet file metadata for Decimal type columns contain min and max values that are not decoded from bytes into Decimals. This causes issues in dependent libraries like Dask (see https://github.com/dask/dask/issues/5647).

       

      Reproducible example
      from decimal import Decimal
      import random
      
      import pandas as pd
      import pyarrow.parquet as pq
      import pyarrow as pa
      
      NUM_DATA_POINTS_PER_PARTITION = 25
      
      random.seed(0)
      data1 = [{"col1": Decimal(f"{random.randint(0, 999)}.{random.randint(0, 99)}")} for i in range(NUM_DATA_POINTS_PER_PARTITION)]
      
      df = pd.DataFrame(data1)
      table = pa.Table.from_pandas(df)
      pq.write_table(table, 'my_data.parquet')
      
      parquet_file = pq.ParquetFile('my_data.parquet')
      
      assert isinstance(parquet_file.metadata.row_group(0).column(0).statistics.min, Decimal) # <-- AssertionError here because min has type bytes rather than Decimal
      assert isinstance(parquet_file.metadata.row_group(0).column(0).statistics.max, Decimal)
      
      

       

       

       

      Attachments

        Issue Links

          Activity

            People

              wjones127 Will Jones
              wjones127 Will Jones
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h