Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5384

Make sure predicates are appropriately pushed down to HoodieFileIndex when lazy listing

    XMLWordPrintableJSON

Details

    Description

      After introduction of lazy-listing capability in HUDI-4812, it exposed an issue in Spark's design, where predicates are pushed-down into generic FileIndex implementations only during the execution phase.

      This poses following issues:

      1. HoodieFileIndex isn't listing the table until `listFiles` method is invoked
      2. Listing would actually be performed only during actual execution in `FileSourceScanExac` node
      3. Since listing isn't performed until the actual execution, table statistics are initialized w/ bogus values (of 1 byte) and Cost-based Optimizations (CBO) will be taking incorrect decisions based on that

      Attachments

        Issue Links

          Activity

            People

              alexey.kudinkin Alexey Kudinkin
              alexey.kudinkin Alexey Kudinkin
              sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: