Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11174

[C++][Dataset] Make Expressions available for projection

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 4.0.0
    • C++

    Description

      RecordBatchProjector should be replaced by an expression calling the "project" compute function.

      Projection currently supports only reordering and subselection of fields, materializing virtual columns where necessary. Replacement with an Expression will enable specifying arbitrary expressions for projected columns:

      // project an explicit selection:
      // SELECT a as "a", b as "b" ...
      project({field_ref("a"), field_ref("b")}, {"a", "b"});
      
      // project an arithmetic expression:
      // SELECT a + b as "a + b" ...
      project({add(field_ref("a"), field_ref("b"))}, {"a + b"})

      This will also allow the same expression optimization machinery used for filters to be directly applied to projections. Virtual columns become a consequence of constant folding:

      // project in a partition where a == 3:
      assert(
        SimplifyWithGuarantee(
          project({field_ref("a"), field_ref("b")}, {"a", "b"}),
          equal(field_ref("a"), literal(3))
        )
        == project({literal(3), field_ref("b")}, {"a", "b"})
      )

       

      Attachments

        Issue Links

          Activity

            People

              bkietz Ben Kietzman
              bkietz Ben Kietzman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 10m
                  5h 10m