Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16649

[C++] Add support for sorting to the Substrait consumer

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++

    Description

      The streaming execution engine supports sorting (I believe, as a sink node option?), but the Substrait consumer does not currently consume sort relations.  Please can we have support for this?

      Here's the example code/plan I tested with (in R, using the in-development substrait package):

       

      library(dplyr)
      library(substrait)
      
      # create a basic table and order it
      out <- tibble::tibble(a = 1, b = 2) %>%
        arrow_substrait_compiler() %>%
        arrange(a)
      
      # take a look at the plan created
      out$plan()
      #> message of type 'substrait.Plan' with 2 fields set
      #> extension_uris {
      #>   extension_uri_anchor: 1
      #> }
      #> relations {
      #>   root {
      #>     input {
      #>       sort {
      #>         input {
      #>           read {
      #>             base_schema {
      #>               names: "a"
      #>               names: "b"
      #>               struct_ {
      #>                 types {
      #>                   fp64 {
      #>                   }
      #>                 }
      #>                 types {
      #>                   fp64 {
      #>                   }
      #>                 }
      #>               }
      #>             }
      #>             named_table {
      #>               names: "named_table_1"
      #>             }
      #>           }
      #>         }
      #>         sorts {
      #>           expr {
      #>             selection {
      #>               direct_reference {
      #>                 struct_field {
      #>                 }
      #>               }
      #>             }
      #>           }
      #>           direction: SORT_DIRECTION_ASC_NULLS_LAST
      #>         }
      #>       }
      #>     }
      #>     names: "a"
      #>     names: "b"
      #>   }
      #> }
      
      # try to run the plan
      collect(out)
      #> Error: NotImplemented: conversion to arrow::compute::Declaration from Substrait relation sort {
      ...
      #> /home/nic2/arrow/cpp/src/arrow/engine/substrait/serde.cc:73  FromProto(plan_rel.rel(), ext_set)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            thisisnic Nicola Crane
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: