[HUDI-3760] Rebase ColStats onto fetching Records by Column prefix - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.11.0
Component/s: None
Labels:
- pull-request-available

Story Points:
8
Epic Link:
RFC-27 Multi Modal Indexing

Description

Right now all the records from ColStats for all columns, for all files are being read to compose the index used in Data Skipping.

In reality, individual queries touch up only a handful of columns at any given moment, so we can very effectively prune the # of records we fetch simply fetching records for the columns referenced in the query (by the key prefix, since CS record key is concatenation of column, partition-path, filename)

Attachments

Issue Links

links to

GitHub Pull Request #5208

Activity

People

Assignee:: Alexey Kudinkin

Reporter:: Alexey Kudinkin

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 31/Mar/22 16:05

Updated:: 06/Apr/22 16:36

Resolved:: 06/Apr/22 16:36