Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13803

[C++] Segfault on filtering taxi dataset

    XMLWordPrintableJSON

Details

    Description

      Found this while testing ARROW-13740. Using the nyc-taxi dataset:

      ds %>%
        filter(total_amount > 0, passenger_count > 0) %>%
        summarise(n = n()) %>%
        collect()
      
       *** caught segfault ***
      address 0x161784000, cause 'invalid permissions'
      
      Traceback:
       1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options)
      ...
      

      lldb shows

      * thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000)
          frame #0: 0x000000013a79d9cc libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long long) + 296
      libarrow.600.dylib`arrow::BitUtil::SetBitmap:
      ->  0x13a79d9cc <+296>: ldrb   w10, [x8]
          0x13a79d9d0 <+300>: cmp    w9, #0x8                  ; =0x8 
          0x13a79d9d4 <+304>: cset   w11, lo
          0x13a79d9d8 <+308>: and    w9, w9, #0x7
      Target 0: (R) stopped.
      (lldb) 
      

      Interestingly, I can evaluate those filter expressions just fine, and it only seems to crash if both are provided. And I can count over the data with both:

      ds %>% 
        group_by(total_amount > 0, passenger_count > 0) %>% 
        summarize(n=n()) %>% 
        collect()
      
      # A tibble: 4 × 3
        `total_amount > 0` `passenger_count > 0`          n
        <lgl>              <lgl>                      <int>
      1 FALSE              FALSE                        805
      2 FALSE              TRUE                      368680
      3 TRUE               FALSE                    5810556
      4 TRUE               TRUE                  1541561340
      

      Attachments

        Issue Links

          Activity

            People

              lidavidm David Li
              npr Neal Richardson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m