Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16288

[C++] ValueDescr::SCALAR nearly unused and does not work for projection

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • C++
    • None

    Description

      First, there are almost no kernels that actually use this shape. Only the functions "all", "any", "list_element", "mean", "product", "struct_field", and "sum" have kernels with this shape. Most kernels that have special logic for scalars handle it by using ValueDescr::ANY

      Second, when passing an expression to the project node, the expression must be bound based on the dataset schema. Since the binding happens based on a schema (and not a batch) the function is bound to ValueDescr::ARRAY (https://github.com/apache/arrow/blob/a16be6b7b6c8271202ff766b99c199b2e29bdfa8/cpp/src/arrow/compute/exec/expression.cc#L461)

      This results in an error if the function has only ValueDescr::SCALAR kernels and would likely be a problem even if the function had both types of kernels because it would get bound to the wrong kernel.

      This simplest fix may be to just get rid of ValueDescr and change all kernels to ValueDescr::ANY behavior. If we choose to keep it we will need to figure out how to handle this kind of binding.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: