Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4565

[R] Reading records with all non-null decimals SEGFAULTs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.13.0
    • R

    Description

      Repro,

       

      library(sparklyr)
      library(arrow)
      sc <- spark_connect(master = "local")
      sdf_len(sc, 10^5) %>% dplyr::mutate(batch = id %% 10)
      

       

      produces using Arrow 0.12, no repro under Arrow 0.11.

       

       *** caught segfault ***
      address 0x10, cause 'memory not mapped'
      
      Traceback:
       1: RecordBatch__to_dataframe(x, use_threads = use_threads)
       2: `as_tibble.arrow::RecordBatch`(record_entry)
       3: tibble::as_tibble(record_entry)
       4: arrow_read_stream(.)
       5: function_list[[i]](value)
       6: freduce(value, `_function_list`)
       7: `_fseq`(`_lhs`)
       8: eval(quote(`_fseq`(`_lhs`)), env, env)
       9: eval(quote(`_fseq`(`_lhs`)), env, env)
      10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
      11: invoke_static(sc, "sparklyr.ArrowConverters", "toArrowBatchRdd",     sdf, session, time_zone) %>% arrow_read_stream() %>% dplyr::bind_rows()
      12: arrow_collect(object, ...)
      

      Notice that the following cast is unsupported, I can add a test if someone can come up with a way of creating a decimal type.

       

       

      batch <- table(tibble::tibble(x = 1:10))
      batch$cast(schema(x = decimal()))

       

      Error in Decimal128Type__initialize(precision, scale) : argument "precision" is missing, with no default
      

      I'll send a PR with a fix...

      Attachments

        Issue Links

          Activity

            People

              javierluraschi Javier Luraschi
              javierluraschi Javier Luraschi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m