Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8657

[Python][C++][Parquet] Forward compatibility issue from 0.16 to 0.17 when using version='2.0'

    XMLWordPrintableJSON

Details

    Description

      With the recent release of 0.17, the ParquetVersion is used to define the logical type interpretation of fields and the selection of the DataPage format.

      As a result all parquet files that were created with ParquetVersion::V2 to get features such as unsigned int32s, timestamps with nanosecond resolution, etc are not forward compatible (cannot be read with 0.16.0). That's TBs of data in my case.

      Those two concerns should be separated. Given that that DataPageV2 pages were not written prior to 0.17 and in order to allow reading existing files, the existing version property should continue to operate as in 0.16 and inform the logical type mapping.

      Some consideration should be given to issue a release 0.17.1.

       

      Attachments

        Issue Links

          Activity

            People

              emkornfield@gmail.com Micah Kornfield
              belzilep Pierre Belzile
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m