Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.17.0
Description
It might be useful to create a DictionaryArray that uses the same dictionary keys as another array. One usecase would be more efficient comparison between arrays if it is known that they use the same dictionary. Another could be more efficient grouping operations, across multiple chunks (ie a `Vec<DictionaryArray>`).
A possible implementation could look like this:
impl<K> StringDictionaryBuilder<K> where K: ArrowDictionaryKeyType, { pub fn new_with_dictionary( keys_builder: PrimitiveBuilder<K>, dictionary_values: &StringArray, ) -> Result<Self> { let mut values_builder = StringBuilder::with_capacity( dictionary_values.len(), dictionary_values.value_data().len(), ); let mut map: HashMap<Box<[u8]>, K::Native> = HashMap::new(); for i in 0..dictionary_values.len() { if dictionary_values.is_valid(i) { let value = dictionary_values.value(i); map.insert( value.as_bytes().into(), K::Native::from_usize(i) .ok_or(ArrowError::DictionaryKeyOverflowError)?, ); values_builder.append_value(value); } else { values_builder.append_null(); } } Ok(Self { keys_builder, values_builder, map, }) } }
I don't really like here that the map has to be reconstructed, maybe there is a more efficient way by passing in the HashMap directly, but it's probably not a good idea to expose the `Box<[u8]>` encoding of its keys.
Attachments
Issue Links
- links to