Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
The Arrow format supports dictionaries being re-used across multiple columns. At present, neither reading nor writing these is supported from C++ (and thus Python).
Support for writing these would require extending the interfaces, but reading might be a relatively quick win - and this issue is concerned with the latter only.
The Java implementation supports writing, so can be used to generate test data.
Once the C++ side is handled, pyarrow should automatically be able to convert such tables to pandas, with appropriate use of categoricals. This should also be verified, and otherwise addressed.