Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7639

[R] Cannot convert Dictionary Array to R when values aren't strings

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.15.1
    • 0.16.0
    • R
    • Ubuntu 16.04.5 LTS

    Description

      I got an error in R when reading a feather file using arrow::read_feather() prepared in python.

      #' Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
      #' Cannot convert Dictionary Array of type `dictionary<values=double, indices=int8, ordered=0>` to R

      I could reproduce the issue with a minimal example:

      In python:

      import pandas as pd
      import pyarrow as pa
      df = pd.DataFrame({"float": [0.1, .2, 0.5, .001]})
      df["category"] = df["float"].astype('category')
      df.dtypes
      #' float float64
      #' A object
      #' category category
      #' dtype: object
      df.to_feather("series.feather")
      pa.__version__
      #' '0.15.1'
      

      From R:

      arrow::read_feather("series.feather")
      #' Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
      #' Cannot convert Dictionary Array of type `dictionary<values=double, indices=int8, ordered=0>` to R
      #' Backtrace:
      #' █
      #' 1. └─arrow::read_feather("series.feather")
      #' 2. ├─[ base::as.data.frame(...) ]
      #' 3. └─arrow:::as.data.frame.Table(out)
      #' 4. └─arrow:::Table__to_dataframe(x, use_threads = option_use_threads())
      

       The feather file is read correctly back in python 

      ft = pd.read_feather("series.feather")
      ft.dtypes
      #' float        float64
      #' A             object
      #' category    category
      #' dtype: object
      
      sessionInfo()
      #' R version 3.5.1 (2018-07-02)
      #' Platform: x86_64-conda_cos6-linux-gnu (64-bit)
      #' Running under: Ubuntu 16.04.5 LTS
      #' 
      #' Matrix products: default
      #' BLAS/LAPACK: /misc/DLshare/home/etbellem/miniconda3/lib/R/lib/libRblas.so
      #' 
      #' locale:
      #' [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
      #' [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
      #' [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
      #' [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
      #' [9] LC_ADDRESS=C LC_TELEPHONE=C
      #' [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
      #' 
      #' attached base packages:
      #' [1] stats graphics grDevices utils datasets methods base
      #' 
      #' loaded via a namespace (and not attached):
      #' [1] Rcpp_1.0.3 arrow_0.15.1 crayon_1.3.4 assertthat_0.2.1
      #' [5] R6_2.4.1 magrittr_1.5 rlang_0.4.2 rstudioapi_0.10
      #' [9] bit64_0.9-7 glue_1.3.1 purrr_0.3.3 bit_1.1-15.1
      #' [13] compiler_3.5.1 tidyselect_0.2.5

      Attachments

        Issue Links

          Activity

            People

              npr Neal Richardson
              etiennebr Etienne Racine
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h