Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10114

[R] Segfault in to_dataframe_parallel with deeply nested structs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.1
    • 2.0.0
    • R

    Description

      A .jsonl file (newline separated JSON) created from open data available at ftp://ftp.libris.kb.se/pub/spa/swepub-deduplicated-2019-12-29.zip is used with the R package arrow (installed from CRAN) using the following statement:

      > arrow::read_json_arrow("~/.config/swepub/head.jsonl")

      It crashes RStudio with no error message. At the R prompt, the error message is:

      Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
      SET_VECTOR_ELT() can only be applied to a 'list', not a 'integer'

      The file "head.jsonl" above was created from the extracted zip's .jsonl file with the *nix "head -1 $BIG_JSONL_FILE" command. It can be parsed with jsonlite and tidyjson.

      Also got this error message at one point:

      > arrow::read_json_arrow("head.jsonl", as_data_frame = TRUE)

          • caught segfault ***
            address 0x8, cause 'memory not mapped'

      Traceback:
      1: structure(x, extra_cols = colonnade[extra_cols], class = "pillar_squeezed_colonnade")
      2: new_colonnade_sqeezed(out, colonnade = x, extra_cols = extra_cols)
      3: pillar::squeeze(x$mcf, width = width)
      4: format.trunc_mat(mat)
      5: format(mat)
      6: format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
      7: format(x, ..., n = n, width = width, n_extra = n_extra)
      8: paste0(..., collapse = "\n")
      9: cli::cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
      10: print.tbl
      11: (function (x, ...) UseMethod("print"))

      Attachments

        1. Dockerfile
          0.2 kB
          Markus Skyttner
        2. reprex_10114.R
          0.8 kB
          Markus Skyttner
        3. Makefile
          0.2 kB
          Markus Skyttner

        Issue Links

          Activity

            People

              romainfrancois Romain Francois
              mskyttner Markus Skyttner
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h