Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14495

[Python] DictionaryArray.from_buffers should not crash

    XMLWordPrintableJSON

Details

    Description

      From https://stackoverflow.com/questions/69746789/how-to-make-a-pyarrow-dictionaryarray-with-extensiontype-using-from-buffers-us

      Trying to create a DictionaryArray with from_buffers crashes:

      >>> import pyarrow as pa
      >>> a = pa.array(["one", "two", "three", "two", "one"]).dictionary_encode()
      >>> b = pa.DictionaryArray.from_buffers(a.type, len(a), a.indices.buffers())
      ../src/arrow/array/array_dict.cc:83:  Check failed: (data->dictionary) != (nullptr) 
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcb26)[0x7fa850076b26]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcaa4)[0x7fa850076aa4]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcac6)[0x7fa850076ac6]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7fa850076e25]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow15DictionaryArrayC2ERKSt10shared_ptrINS_9ArrayDataEE+0x1b9)[0x7fa84fad33fb]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN9__gnu_cxx13new_allocatorIN5arrow15DictionaryArrayEE9constructIS2_JRKSt10shared_ptrINS1_9ArrayDataEEEEEvPT_DpOT0_+0x49)[0x7fa84fc0f9f5]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt16allocator_traitsISaIN5arrow15DictionaryArrayEEE9constructIS1_JRKSt10shared_ptrINS0_9ArrayDataEEEEEvRS2_PT_DpOT0_+0x38)[0x7fa84fc0d44d]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt23_Sp_counted_ptr_inplaceIN5arrow15DictionaryArrayESaIS1_ELN9__gnu_cxx12_Lock_policyE2EEC2IJRKSt10shared_ptrINS0_9ArrayDataEEEEES2_DpOT_+0xaf)[0x7fa84fc0a027]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN5arrow15DictionaryArrayESaIS5_EJRKSt10shared_ptrINS4_9ArrayDataEEEEERPT_St20_Sp_alloc_shared_tagIT0_EDpOT1_+0xb2)[0x7fa84fc04560]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt12__shared_ptrIN5arrow15DictionaryArrayELN9__gnu_cxx12_Lock_policyE2EEC1ISaIS1_EJRKSt10shared_ptrINS0_9ArrayDataEEEEESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x4c)[0x7fa84fbffcdc]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt10shared_ptrIN5arrow15DictionaryArrayEEC2ISaIS1_EJRKS_INS0_9ArrayDataEEEEESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x39)[0x7fa84fbfd8f9]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZSt15allocate_sharedIN5arrow15DictionaryArrayESaIS1_EJRKSt10shared_ptrINS0_9ArrayDataEEEES3_IT_ERKT0_DpOT1_+0x38)[0x7fa84fbfb500]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZSt11make_sharedIN5arrow15DictionaryArrayEJRKSt10shared_ptrINS0_9ArrayDataEEEES2_IT_EDpOT0_+0x54)[0x7fa84fbf7be6]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0xd36104)[0x7fa84fbf0104]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0xd2f2f8)[0x7fa84fbe92f8]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x99)[0x7fa84fbe1d3d]
      

      I don't know if this can ever work with the current signature, since you can only pass buffers and not the dictionary itself (which is not included in the buffers). In C++ there is an ArrayData::Make that in addition also takes a dictionary. I think we should add a custom from_buffers on DictionaryArray in cython to use that instead of the base class from_buffers implementation.

      Attachments

        Issue Links

          Activity

            People

              milesgranger Miles Granger
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h