Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7639

[R] Cannot convert Dictionary Array to R when values aren't strings

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.15.1
    • Fix Version/s: 0.16.0
    • Component/s: R
    • Environment:
      Ubuntu 16.04.5 LTS

      Description

      I got an error in R when reading a feather file using arrow::read_feather() prepared in python.

      #' Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
      #' Cannot convert Dictionary Array of type `dictionary<values=double, indices=int8, ordered=0>` to R

      I could reproduce the issue with a minimal example:

      In python:

      import pandas as pd
      import pyarrow as pa
      df = pd.DataFrame({"float": [0.1, .2, 0.5, .001]})
      df["category"] = df["float"].astype('category')
      df.dtypes
      #' float float64
      #' A object
      #' category category
      #' dtype: object
      df.to_feather("series.feather")
      pa.__version__
      #' '0.15.1'
      

      From R:

      arrow::read_feather("series.feather")
      #' Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
      #' Cannot convert Dictionary Array of type `dictionary<values=double, indices=int8, ordered=0>` to R
      #' Backtrace:
      #' █
      #' 1. └─arrow::read_feather("series.feather")
      #' 2. ├─[ base::as.data.frame(...) ]
      #' 3. └─arrow:::as.data.frame.Table(out)
      #' 4. └─arrow:::Table__to_dataframe(x, use_threads = option_use_threads())
      

       The feather file is read correctly back in python 

      ft = pd.read_feather("series.feather")
      ft.dtypes
      #' float        float64
      #' A             object
      #' category    category
      #' dtype: object
      
      sessionInfo()
      #' R version 3.5.1 (2018-07-02)
      #' Platform: x86_64-conda_cos6-linux-gnu (64-bit)
      #' Running under: Ubuntu 16.04.5 LTS
      #' 
      #' Matrix products: default
      #' BLAS/LAPACK: /misc/DLshare/home/etbellem/miniconda3/lib/R/lib/libRblas.so
      #' 
      #' locale:
      #' [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
      #' [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
      #' [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
      #' [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
      #' [9] LC_ADDRESS=C LC_TELEPHONE=C
      #' [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
      #' 
      #' attached base packages:
      #' [1] stats graphics grDevices utils datasets methods base
      #' 
      #' loaded via a namespace (and not attached):
      #' [1] Rcpp_1.0.3 arrow_0.15.1 crayon_1.3.4 assertthat_0.2.1
      #' [5] R6_2.4.1 magrittr_1.5 rlang_0.4.2 rstudioapi_0.10
      #' [9] bit64_0.9-7 glue_1.3.1 purrr_0.3.3 bit_1.1-15.1
      #' [13] compiler_3.5.1 tidyselect_0.2.5

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                npr Neal Richardson
                Reporter:
                etiennebr Etienne Racine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h