Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17915

[C++] Error when using Substrait ProjectRel

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 10.0.0
    • C++

    Description

      After ARROW-16989 and ARROW-15584, there is new behaviour with ProjectRel. I implemented a solution that worked with DuckDB's consumer in https://github.com/voltrondata/substrait-r/pull/181, but when I try with Arrow's compiler I get an error:

      library(arrow, warn.conflicts = FALSE)
      #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
      
      plan_as_json <- '{
        "extensionUris": [
          {
            "extensionUriAnchor": 1,
            "uri": "https://github.com/apache/arrow/blob/master/format/substrait/extension_types.yaml"
          }
        ],
        "relations": [
          {
            "rel": {
              "project": {
                "common": {"emit": {"outputMapping": [2, 3]}},
                "input": {
                  "read": {
                    "baseSchema": {
                      "names": ["int", "dbl"],
                      "struct": {"types": [{"i32": {}}, {"fp64": {}}]}
                    },
                    "localFiles": {
                      "items": [
                        {
                          "uriFile": "file://THIS_IS_THE_TEMP_FILE",
                          "parquet": {}
                        }
                      ]
                    }
                  }
                },
                "expressions": [
                  {"selection": {"directReference": {"structField": {"field": 1}}}},
                  {"selection": {"directReference": {"structField": {"field": 0}}}}
                ]
              }
            }
          }
        ]
      }'
      
      temp_parquet <- tempfile()
      write_parquet(data.frame(int = integer(), dbl = double()), temp_parquet)
      plan_as_json <- gsub("THIS_IS_THE_TEMP_FILE", temp_parquet, plan_as_json)
      arrow:::do_exec_plan_substrait(plan_as_json)
      #> Error: Invalid: Invalid column index to add field.
      #> /Users/dewey/Desktop/rscratch/arrow/cpp/src/arrow/engine/substrait/relation_internal.cc:338  project_schema->AddField( num_columns + static_cast<int>(project.expressions().size()) - 1, std::move(project_field))
      #> /Users/dewey/Desktop/rscratch/arrow/cpp/src/arrow/engine/substrait/serde.cc:156  FromProto(plan_rel.has_root() ? plan_rel.root().input() : plan_rel.rel(), ext_set, conversion_options)
      

      It's admittedly a goofy thing to do: to compute a new column that is an identical copy of an existing column and then discard the original. I can and should simplify the substrait that I'm generating, but maybe this is also valid substrait that should be accepted?

      Attachments

        Issue Links

          Activity

            People

              vibhatha Vibhatha Lakmal Abeykoon
              paleolimbot Dewey Dunnington
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m