Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11629

[C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • 3.0.0
    • None
    • C++, Python
    • None

    Description

      If I try to read the attached csv file with pyarrow, changing the float64 columns to float32 and export it to parquet, the parquet file gets corrupted. It is not readable for apache drill or Parquet.Net any longer.

       

      Update: Bug in "Dictionary Encoding" feature. If I switch it off for float32 columns, everything works as expected.

      Attachments

        1. parquet-dotnet.csv
          54 kB
          Matthias Rosenthaler
        2. output.parquet
          6.75 MB
          Matthias Rosenthaler
        3. output.csv
          39.30 MB
          Matthias Rosenthaler
        4. image-2021-02-15-15-49-41-908.png
          34 kB
          Matthias Rosenthaler
        5. foo.parquet
          9.02 MB
          Micah Kornfield
        6. drill_query.csv
          52 kB
          Matthias Rosenthaler

        Activity

          People

            Unassigned Unassigned
            matthros Matthias Rosenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: