Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
ARROW-13358, ARROW-14167, ARROW-13573, etc. add support for dictionary types in various generic 'selection'-style kernels. However, the support is fairly unoptimized, repeatedly hashing and inserting the same values. Instead, we could implement more optimized support in some cases.
For example, for if_else, we could first unify both dictionaries (in such a way that the first dictionary is unchanged), then copy-with-transpose the indices of the second argument, then copy the relevant indices of the first argument (which doesn't require transposition). This would require DictionaryUnifier to support nulls in dictionaries, however.
Attachments
Issue Links
- is related to
-
ARROW-14649 [R] Include unused factor levels in coalesce() and if_else() output
- Open
- relates to
-
ARROW-13358 [C++] Extend type support for if_else kernel
- Resolved
-
ARROW-13573 [C++] Support dictionaries directly in case_when kernel
- Resolved
-
ARROW-14167 [C++] Support dictionaries directly in coalesce kernel
- Resolved