Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
The streaming execution engine supports sorting (I believe, as a sink node option?), but the Substrait consumer does not currently consume sort relations. Please can we have support for this?
Here's the example code/plan I tested with (in R, using the in-development substrait package):
library(dplyr) library(substrait) # create a basic table and order it out <- tibble::tibble(a = 1, b = 2) %>% arrow_substrait_compiler() %>% arrange(a) # take a look at the plan created out$plan() #> message of type 'substrait.Plan' with 2 fields set #> extension_uris { #> extension_uri_anchor: 1 #> } #> relations { #> root { #> input { #> sort { #> input { #> read { #> base_schema { #> names: "a" #> names: "b" #> struct_ { #> types { #> fp64 { #> } #> } #> types { #> fp64 { #> } #> } #> } #> } #> named_table { #> names: "named_table_1" #> } #> } #> } #> sorts { #> expr { #> selection { #> direct_reference { #> struct_field { #> } #> } #> } #> } #> direction: SORT_DIRECTION_ASC_NULLS_LAST #> } #> } #> } #> names: "a" #> names: "b" #> } #> } # try to run the plan collect(out) #> Error: NotImplemented: conversion to arrow::compute::Declaration from Substrait relation sort { ... #> /home/nic2/arrow/cpp/src/arrow/engine/substrait/serde.cc:73 FromProto(plan_rel.rel(), ext_set)