Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10406

[C++] Unify dictionaries when writing IPC file in a single shot

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0
    • C++

    Description

      I read a big (taxi) csv file and specified that I wanted to dictionary-encode some columns. The resulting Table has ChunkedArrays with 1604 chunks. When I go to write this Table to the IPC file format (write_feather), I get an error:

        Invalid: Dictionary replacement detected when writing IPC file format. Arrow IPC files only support a single dictionary for a given field accross all batches.
      

      I can write this to Parquet and read it back in, and the roundtrip of the data is correct. We should be able to do this in IPC too.

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              npr Neal Richardson
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 50m
                  2h 50m