Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11838

[C++] Support reading IPC data with shared dictionaries

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 4.0.0
    • C++

    Description

      The Arrow format supports dictionaries being re-used across multiple columns. At present, neither reading nor writing these is supported from C++ (and thus Python). 

      Support for writing these would require extending the interfaces, but reading might be a relatively quick win - and this issue is concerned with the latter only.

      The Java implementation supports writing, so can be used to generate test data. 

      Once the C++ side is handled, pyarrow should automatically be able to convert such tables to pandas, with appropriate use of categoricals. This should also be verified, and otherwise addressed.

       

       

       

      Attachments

        Activity

          People

            jmgpeeters Joris Peeters
            jmgpeeters Joris Peeters
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h

                Slack

                  Issue deployment