Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12159

[Rust][DataFusion] Support grouping on expressions

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • Rust, Rust - DataFusion
    • None

    Description

      Usecase:

      I want to group based on time windows (as defined by the `date_trunc` function).

      For example, given the table:

      +------+-------------------+---------------------+-------------+------------------+-------------------+--------------+-----------+------------+---------------+-------------+--------------------+--------------------+
      | cpu  | host              | time                | usage_guest | usage_guest_nice | usage_idle        | usage_iowait | usage_irq | usage_nice | usage_softirq | usage_steal | usage_system       | usage_user         |
      +------+-------------------+---------------------+-------------+------------------+-------------------+--------------+-----------+------------+---------------+-------------+--------------------+--------------------+
      | cpu0 | MacBook-Pro.local | 1617130130000000000 | 0           | 0                | 65.30408773649165 | 0            | 0         | 0          | 0             | 0           | 18.444666002000673 | 16.251246261217506 |
      | cpu1 | MacBook-Pro.local | 1617130130000000000 | 0           | 0                | 84.43113772402216 | 0            | 0         | 0          | 0             | 0           | 3.193612774446795  | 12.37524950097282  |
      | cpu2 | MacBook-Pro.local | 1617130130000000000 | 0           | 0                | 65.96806387199344 | 0            | 0         | 0          | 0             | 0           | 15.469061876247794 | 18.56287425146831  |
      | cpu3 | MacBook-Pro.local | 1617130130000000000 | 0           | 0                | 84.0478564307993  | 0            | 0         | 0          | 0             | 0           | 3.0907278165770684 | 12.861415752863932 |
      | cpu4 | MacBook-Pro.local | 1617130130000000000 | 0           | 0                | 63.21036889281897 | 0            | 0         | 0          | 0             | 0           | 13.758723828377473 | 23.030907278223218 |
      | cpu5 | MacBook-Pro.local | 1617130130000000000 | 0           | 0                | 83.94815553242313 | 0            | 0         | 0          | 0             | 0           | 2.991026919231221  | 13.0608175473346   |
      | cpu6 | MacBook-Pro.local | 1617130130000000000 | 0           | 0                | 70.85828343276965 | 0            | 0         | 0          | 0             | 0           | 12.87425149699077  | 16.26746506987651  |
      | cpu7 | MacBook-Pro.local | 1617130130000000000 | 0           | 0                | 83.9321357287122  | 0            | 0         | 0          | 0             | 0           | 3.093812375243205  | 12.974051896176206 |
      | cpu8 | MacBook-Pro.local | 1617130130000000000 | 0           | 0                | 74.80079681313936 | 0            | 0         | 0          | 0             | 0           | 10.756972111708253 | 14.442231075949556 |
      | cpu9 | MacBook-Pro.local | 1617130130000000000 | 0           | 0                | 83.84845463618315 | 0            | 0         | 0          | 0             | 0           | 3.0907278165434624 | 13.060817547316466 |
      +------+-------------------+---------------------+-------------+------------------+-------------------+--------------+-----------+------------+---------------+-------------+--------------------+--------------------+
      
      

      I want to be able to find the min and max usage time grouped by minute

      select 
        date_trunc('minute', cast (time as timestamp)), 
        min(usage_user), 
        max(usage_user) 
      from
        cpu 
      group by 
        date_trunc('minute', cast (time as timestamp)), min(usage_user)"
      

      Or alternately

      select 
        date_trunc('minute', cast (time as timestamp)), 
        min(usage_user), 
        max(usage_user) 
      from
        cpu 
      group by 
        1
      
      Instead as of now I get a planning error:
      Error preparing query Error during planning: Projection references non-aggregate values
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            alamb Andrew Lamb
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: