Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6882

[Python] cannot create a chunked_array from dictionary_encoding result

    XMLWordPrintableJSON

    Details

      Description

      I've experienced a strange error raise when trying to apply `pa.chunked_array` directly on the indices of dictionary_encoding (code is below). Making a memory view solves the problem.

      import pyarrow as pa
      ca = pa.array(['a', 'a', 'b', 'b', 'c'])                                                                                           
      fca = ca.dictionary_encode()                                                                                                       
      fca.indices                                                                                                                        
      <pyarrow.lib.Int32Array object at 0x1250fb888>
      [
        0,
        0,
        1,
        1,
        2
      ]
      
      pa.chunked_array([fca.indices])                                                                                                    
      ---------------------------------------------------------------------------
      ArrowInvalid                              Traceback (most recent call last)
      <ipython-input-44-71ca3b877e1c> in <module>
      ----> 1 pa.chunked_array([fca.indices])
      
      ~/Projects/miniconda3/envs/pyarrow/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.chunked_array()
      
      ~/Projects/miniconda3/envs/pyarrow/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
      
      ArrowInvalid: Unexpected dictionary values in array of type int32
      
      # with another memory view it's  OK
      pa.chunked_array([fca.indices.view(fca.indices.type)])                 
      Out[45]: 
      <pyarrow.lib.ChunkedArray object at 0x12508dc78>
      [
        [
          0,
          0,
          1,
          1,
          2
        ]
      ]
       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jorisvandenbossche Joris Van den Bossche
                Reporter:
                ArtemK Artem KOZHEVNIKOV
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m