Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2420

ThriftParquetWriter converts thrift byte to int32 without adding logical type

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.14.0
    • parquet-thrift
    • None

    Description

      The current implementation of Parquet serialisation from Thrift Definitions results in the incorrect conversion of Thrift byte fields into INT32 without preserving the required LogicalType Metadata in the Parquet file. This behaviour leads to a loss of information and is inconsistent with the expected behaviour. The correct conversion should result in INT32 with LogicalType metadata indicating a bit width of 8 and signed as true.

       

      Thrift Definition

      struct TestLogicalType {
      1: required i16 test_i16,
      2: required byte test_i8
      } 

      Current Parquet Schema Generated

      message ParquetSchema {
        required int32 test_i16 (INTEGER(16,true)) = 1;
        required int32 test_i8 = 2;
      } 

      Expected Parquet Schema 

      message ParquetSchema {
        required int32 test_i16 (INTEGER(16,true)) = 1;
        required int32 test_i8 (INTEGER(8,true)) = 2;
      } 

      Attachments

        Issue Links

          Activity

            People

              shreyas-dview Shreyas B
              shreyas-dview Shreyas B
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: