Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9132

[C++] Implement hash kernels for dictionary data with constant dictionaries

    XMLWordPrintableJSON

Details

    Description

      Enabling [`strings_as_dictionary`](https://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html?highlight=strings_as_dictionary#obtaining-apache-arrow-result-sets) in `turbodbc` returns a `ChunkedArray` of `dictionary` type (IIUC).

      I'd like to enable this for better performance however it seems not all functionality is implemented for `dictionary` types? In particular, `unique` seems not to be implemented:

      In [40]: nmi.__class__.mro()
      Out[40]: [pyarrow.lib.ChunkedArray, pyarrow.lib._PandasConvertible, object]
      
      In [41]: nmi.type
      Out[41]: DictionaryType(dictionary<values=string, indices=int32, ordered=0>)
      
      In [42]: nmi.unique()
      Traceback (most recent call last):
      
        File "<ipython-input-42-0fcb7893d5c4>", line 1, in <module>
          nmi.unique()
      
        File "pyarrow\table.pxi", line 307, in pyarrow.lib.ChunkedArray.unique
      
        File "pyarrow\error.pxi", line 106, in pyarrow.lib.check_status
      
      ArrowNotImplementedError: unique not implemented for dictionary<values=string, indices=int32, ordered=0>
      

      It would be very useful if the `dictionary` type supported all the usual operations.

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              dhirschfeld Dave Hirschfeld
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m