Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14653

[R] head() hangs on CSV datasets > 600MB

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 7.0.0
    • R

    Description

      I'm calling head() on a CSV dataset containing CSV files.  I'm doing this as I want to preview my dataset before I try to do anything with it that's going to be more expensive computationally.

      library(arrow)
      library(dplyr)
      open_dataset("../../data/nyc-raw/", format = "csv") %>%
        head(1) %>%
        collect()
      

      I have experimented with different combinations of files in the dataset folder, and it seems to work fine when my total file size is <~600Mb but hang if it's above that. This might not even be what that actual issue is but I'm struggling to narrow it down beyond add extra files to the equation.

      I've tried running with with the C++ debugger attached, but again, it just hangs.

      The files I'm using are the 2020-2021 Yellow Taxi trip records available from: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

      A bit of investigation has shown me that I can load in different subsets of files in fine, but when using all of them, the session hangs.

      Attachments

        Issue Links

          Activity

            People

              thisisnic Nicola Crane
              thisisnic Nicola Crane
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m

                  Slack

                    Issue deployment