Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16164

[C++] Pushdown filters on augmented columns like fragment filename

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++
    • None

    Description

      In the discussion on ARROW-15260, if we run the following code in R, we might expect it to push down the filter so we can just read in the relevant files:

        filter = Expression$create(
          "match_substring",
          Expression$field_ref("__filename"),
          options = list(pattern = "cyl=8")
        )
      

      As mentioned by westonpace:

      "You might think we would get the hint and only read files matching that pattern. This is not the case. We will read the entire dataset and apply the "cyl=8" filter in memory.

      If we want to pushdown filters on the filename column we will need to add some special logic."

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              thisisnic Nicola Crane
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: