Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Trying to create a DictionaryArray with from_buffers crashes:
>>> import pyarrow as pa >>> a = pa.array(["one", "two", "three", "two", "one"]).dictionary_encode() >>> b = pa.DictionaryArray.from_buffers(a.type, len(a), a.indices.buffers()) ../src/arrow/array/array_dict.cc:83: Check failed: (data->dictionary) != (nullptr) /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcb26)[0x7fa850076b26] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcaa4)[0x7fa850076aa4] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcac6)[0x7fa850076ac6] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7fa850076e25] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow15DictionaryArrayC2ERKSt10shared_ptrINS_9ArrayDataEE+0x1b9)[0x7fa84fad33fb] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN9__gnu_cxx13new_allocatorIN5arrow15DictionaryArrayEE9constructIS2_JRKSt10shared_ptrINS1_9ArrayDataEEEEEvPT_DpOT0_+0x49)[0x7fa84fc0f9f5] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt16allocator_traitsISaIN5arrow15DictionaryArrayEEE9constructIS1_JRKSt10shared_ptrINS0_9ArrayDataEEEEEvRS2_PT_DpOT0_+0x38)[0x7fa84fc0d44d] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt23_Sp_counted_ptr_inplaceIN5arrow15DictionaryArrayESaIS1_ELN9__gnu_cxx12_Lock_policyE2EEC2IJRKSt10shared_ptrINS0_9ArrayDataEEEEES2_DpOT_+0xaf)[0x7fa84fc0a027] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN5arrow15DictionaryArrayESaIS5_EJRKSt10shared_ptrINS4_9ArrayDataEEEEERPT_St20_Sp_alloc_shared_tagIT0_EDpOT1_+0xb2)[0x7fa84fc04560] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt12__shared_ptrIN5arrow15DictionaryArrayELN9__gnu_cxx12_Lock_policyE2EEC1ISaIS1_EJRKSt10shared_ptrINS0_9ArrayDataEEEEESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x4c)[0x7fa84fbffcdc] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt10shared_ptrIN5arrow15DictionaryArrayEEC2ISaIS1_EJRKS_INS0_9ArrayDataEEEEESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x39)[0x7fa84fbfd8f9] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZSt15allocate_sharedIN5arrow15DictionaryArrayESaIS1_EJRKSt10shared_ptrINS0_9ArrayDataEEEES3_IT_ERKT0_DpOT1_+0x38)[0x7fa84fbfb500] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZSt11make_sharedIN5arrow15DictionaryArrayEJRKSt10shared_ptrINS0_9ArrayDataEEEES2_IT_EDpOT0_+0x54)[0x7fa84fbf7be6] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0xd36104)[0x7fa84fbf0104] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0xd2f2f8)[0x7fa84fbe92f8] /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x99)[0x7fa84fbe1d3d]
I don't know if this can ever work with the current signature, since you can only pass buffers and not the dictionary itself (which is not included in the buffers). In C++ there is an ArrayData::Make that in addition also takes a dictionary. I think we should add a custom from_buffers on DictionaryArray in cython to use that instead of the base class from_buffers implementation.
Attachments
Issue Links
- links to