Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14630

[C++] DCHECK in GroupByNode when error encountered

    XMLWordPrintableJSON

Details

    Description

      thisisnic found that this example crashes:

      library(arrow)
      library(dplyr)
      write_dataset(group_by(iris, Species), "iris_data")
      open_dataset("iris_data") %>%
        group_by(Species) %>%
        summarise(mean(Sepal.Length)) %>%
        collect()  

      There are two bugs here:

      • StopProducing is written in a way that causes a future to be finished twice, triggering a DCHECK.
      • Consume() doesn't set the length of the key column batch, causing a spurious error because the group ID datum and the values datum will have different lengths.

      Attachments

        Issue Links

          Activity

            People

              lidavidm David Li
              thisisnic Nicola Crane
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m