Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2100

Merging two valid parquet files produces a corrupted result file in 1.12.1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.12.1, 1.12.2
    • None
    • parquet-mr
    • None

    Description

      This ticket relates to PARQUET-2027. In the previous ticket for two parquet files produced by 1.11.x merging was failing in 1.12.0. For 1.12.1 merging was fixed, i. e. it doesn't fail. But in the same time it results with a corrupted output file. The error:

      Dictionary page must be before data page.
      

      is thrown when one tries to read it. It comes from this https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712.

      I attached two example input files and the outcome of merging.

       

      Attachments

        1. output_file.parquet
          9 kB
          Matthew M
        2. input_file2.parquet
          8 kB
          Matthew M
        3. input_file1.parquet
          8 kB
          Matthew M

        Activity

          People

            Unassigned Unassigned
            eltherion Matthew M
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: