Details
-
Epic
-
Status: Closed
-
Blocker
-
Resolution: Done
-
None
-
0
-
RFC-27 Multi Modal Indexing
Description
RFC-27 umbrella ticket. Goal is to support global range index to improve query planning time.
Attachments
Attachments
Issue Links
Issues in epic
HUDI-4245 | Support nested fields in Column Stats Index |
|
Open | Alexey Kudinkin | ||
|
HUDI-3495 | Reading keys in parallel from HoodieMetadataMergedLogRecordReader may lead to empty results even if key exists |
|
Closed | Yue Zhang | |
HUDI-3777 | Optimize column stats storage |
|
Open | Unassigned | ||
HUDI-4033 | Aggregated cols stats at partition level in col stats partition in MDT |
|
Open | Unassigned | ||
HUDI-3317 | Partition specific pointed lookup/reading strategy for metadata table |
|
Open | Sagar Sumit | ||
HUDI-3783 | Fix HoodieTestTable harness to also properly validate Column Stats |
|
Open | Unassigned | ||
HUDI-3166 | Implement new HoodieIndex based on metadata indices |
|
Patch Available | Sagar Sumit | ||
HUDI-3288 | Partition specific compaction strategy for the metadata table |
|
Open | Sagar Sumit | ||
HUDI-3866 | Support Data Skipping for MOR |
|
Open | Alexey Kudinkin | ||
HUDI-3323 | Refactor: Metadata various partitions payload merging using delegation pattern |
|
Open | Unassigned | ||
HUDI-3368 | Support metadata bloom index for secondary keys |
|
Reopened | Sagar Sumit | ||
HUDI-3809 | Make full scan optional for metadata partitions other than FILES |
|
Open | Unassigned | ||
HUDI-3914 | Enhance TestColumnStatsIndex to test indexing with regular writes and table services |
|
Open | Unassigned | ||
|
HUDI-3374 | metadata index for secondary keys |
|
Closed | Sagar Sumit | |
|
HUDI-3364 | Support column stats indexing for subset of columns |
|
Closed | Sagar Sumit | |
HUDI-686 | Implement BloomIndexV2 that does not depend on memory caching |
|
Reopened | Rajesh Mahindra | ||
|
HUDI-2602 | RFC: Metadata based range index |
|
Closed | Manoj Govindassamy | |
|
HUDI-2700 | Metadata based bloom index - PoC |
|
Closed | Manoj Govindassamy | |
|
HUDI-1295 | Implement: Metadata based bloom index - write path |
|
Closed | Manoj Govindassamy | |
|
HUDI-3316 | HoodieColumnRangeMetadata doesn't include all statistics for the column |
|
Closed | Manoj Govindassamy | |
|
HUDI-1296 | Support Metadata Table in Spark DataSource |
|
Closed | Alexey Kudinkin | |
|
HUDI-1613 | Document for interfaces for Spark on Hive & Spark Datasource integrations with index |
|
Closed | Unassigned | |
|
HUDI-2518 | Implement stats/range tracking as a part of Metadata table |
|
Closed | Manoj Govindassamy | |
|
HUDI-2714 | Benchmark MetaIndex performance w/ bloom and column stat metadata |
|
Closed | Manoj Govindassamy | |
|
HUDI-2581 | Analyze metadata size estimate in hudi with Hfile for col stats partition |
|
Closed | sivabalan narayanan | |
|
HUDI-2586 | Design col stats partition in Metadata table |
|
Closed | Manoj Govindassamy | |
|
HUDI-2587 | Impl metadata table based bloom index |
|
Closed | Manoj Govindassamy | |
|
HUDI-2588 | Test metadata based bloom index |
|
Closed | Manoj Govindassamy | |
HUDI-2657 | Make inlining configurable based on diff use-case. |
|
Open | Prashant Wason | ||
|
HUDI-2705 | Metadata based column stats index - PoC |
|
Closed | Manoj Govindassamy | |
|
HUDI-2973 | Rewrite/re-publish RFC for Data skipping index |
|
Closed | Sagar Sumit | |
|
HUDI-3143 | Support multiple file groups for metadata table index partitions |
|
Closed | Manoj Govindassamy | |
|
HUDI-3144 | Metadata table getRecordsByKeys() operations with inline reading has poor performance |
|
Closed | Manoj Govindassamy | |
|
HUDI-3141 | Metadata table getAllFilesInPartition() crashes with NullPointerException |
|
Closed | Manoj Govindassamy | |
|
HUDI-3160 | Column Stats index should use the same column for the index key in the write and read code path |
|
Closed | Manoj Govindassamy | |
|
HUDI-3219 | Summary of performance related issues that MetaIndex would address |
|
Closed | Manoj Govindassamy | |
|
HUDI-3804 | Partition metadata is not properly created for Column Stats |
|
Closed | sivabalan narayanan | |
|
HUDI-3273 | Performance: Metadata table log file scanning and base file merging are repeated for each keys lookup request |
|
Closed | Manoj Govindassamy | |
|
HUDI-3260 | Support column stat index for multiple columns |
|
Closed | Manoj Govindassamy | |
|
HUDI-2589 | RFC: Metadata based index for bloom filter and column stats |
|
Closed | Manoj Govindassamy | |
|
HUDI-3356 | Conversion of write stats to metadata index records should use HoodieData throughout |
|
Closed | Sagar Sumit | |
|
HUDI-3142 | Metadata new Indices initialization during table creation |
|
Closed | Sagar Sumit | |
|
HUDI-3203 | Meta bloom index should use the bloom filter type property to construct back the bloom filter instant |
|
Closed | Sagar Sumit | |
|
HUDI-1492 | Enhance DeltaWriteStat with block level metadata correctly for storage schemes that support appends |
|
Closed | Manoj Govindassamy | |
|
HUDI-3258 | Support multiple metadata index partitions - bloom and column stats |
|
Closed | Sagar Sumit | |
|
HUDI-2584 | Unit tests for bloom filter index based out of metadata table. |
|
Closed | Sagar Sumit | |
|
HUDI-3382 | Support removal of bloom and column stats indexes |
|
Closed | Sagar Sumit | |
|
HUDI-3486 | Value and null count in metadata col stats from metadata table are wrong |
|
Closed | Ethan Guo | |
|
HUDI-3327 | Support bloom filter indexing for all columns/fields |
|
Closed | Manoj Govindassamy | |
|
HUDI-3332 | Handle all supported data types for column stats index |
|
Closed | Manoj Govindassamy | |
HUDI-3167 | Update RFC27 with the design for the new HoodieIndex type based on metadata indices |
|
Open | Unassigned | ||
|
HUDI-3181 | Address test failures after enabling metadata index for bloom filters and column stats |
|
Closed | Sagar Sumit | |
|
HUDI-3514 | Leverage MT Column-stats Index in HoodieFileIndex |
|
Closed | Alexey Kudinkin | |
|
HUDI-3653 | Clean up Column Stats Index introduced along with Spatial Curves Clustering |
|
Closed | Alexey Kudinkin | |
|
HUDI-3324 | Query Integration: Support returning file names matching the given columns and ranges |
|
Closed | Alexey Kudinkin | |
|
HUDI-3325 | Query Integration: Util to get aggregate columns ranges across all files from the column index |
|
Closed | Alexey Kudinkin | |
|
HUDI-3326 | Query Integration: HoodieFileReader should expose API for getting range metadata |
|
Closed | Alexey Kudinkin | |
|
HUDI-3405 | Query Integration: Graceful fallback when indexes are not available |
|
Closed | Alexey Kudinkin | |
|
HUDI-3594 | Support standard Spark functions in Filter Exprs in Data Skipping |
|
Closed | Alexey Kudinkin | |
|
HUDI-3655 | AvroRuntimeException from TestLayoutOptimization regarding column stats |
|
Closed | Alexey Kudinkin | |
|
HUDI-3684 | NPE in ParquetUtils |
|
Closed | Alexey Kudinkin | |
|
HUDI-3731 | Failure to merging Column Stats Records |
|
Closed | Alexey Kudinkin | |
|
HUDI-3663 | Make sure Column Stats is able to index all Columns |
|
Closed | Alexey Kudinkin | |
|
HUDI-3664 | Column Stats are computed incorrectly right now |
|
Closed | Alexey Kudinkin | |
|
HUDI-3708 | Upsert to metadata table fails due to schema change |
|
Closed | Ethan Guo | |
|
HUDI-3739 | Fix translation of isNotNull predicates in Data Skipping |
|
Closed | Alexey Kudinkin | |
|
HUDI-3760 | Rebase ColStats onto fetching Records by Column prefix |
|
Closed | Alexey Kudinkin | |
|
HUDI-3743 | Support DELETE_PARTITION for metadata table |
|
Closed | Sagar Sumit | |
|
HUDI-3714 | Validating Data Skipping on MT |
|
Closed | Alexey Kudinkin | |
|
HUDI-3611 | Benchmark Data Skipping using MT |
|
Closed | Alexey Kudinkin | |
|
HUDI-3773 | Revisit performance of bloom filter writing flow in MDT for large batch ingestion |
|
Closed | Ethan Guo | |
|
HUDI-3782 | Update metadata partitions table config when any of them is enabled/disabled |
|
Closed | Sagar Sumit | |
|
HUDI-3776 | Fix BloomIndex incorrectly using ColStats to lookup records locations |
|
Closed | Sagar Sumit | |
|
HUDI-3841 | Data Skipping is not working correctly in the presence of Schema Evolution |
|
Closed | Alexey Kudinkin | |
|
HUDI-3812 | Make sure Data Skipping respects Metadata Table config |
|
Closed | Alexey Kudinkin | |
|
HUDI-3834 | Evaluate MT Column Stats Performance |
|
Closed | Alexey Kudinkin | |
|
HUDI-3867 | Disable Data Skipping by default in 0.11 |
|
Closed | Alexey Kudinkin | |
|
HUDI-3865 | Support Data Skipping for MOR |
|
Closed | Alexey Kudinkin | |
HUDI-3971 | Revisit UT/FTs for colstats and bloom filter and fill gaps |
![]() |
Open | Unassigned | ||
|
HUDI-4202 | Make sure Column Stats partition is cached after first time being read |
|
Closed | Alexey Kudinkin | |
|
HUDI-4250 | Optimize Data Skipping to enable in-memory Column Stats Index |
|
Closed | Alexey Kudinkin | |
|
HUDI-3806 | Improve HoodieBloomIndex using bloom_filter and col_stats in MDT |
|
Closed | Ethan Guo | |
|
HUDI-4851 | Fix CSI not supporting InSet operator |
|
Closed | Alexey Kudinkin | |
|
HUDI-3259 | Code Refactor: Common prep records commit util for Spark and Flink |
|
Closed | Jonathan Vexler | |
|
HUDI-5291 | NPE in collumn stats for null values |
|
Closed | Alexey Kudinkin | |
HUDI-5608 | Support decimals w/ precision > 30 in Column Stats |
|
Open | Unassigned | ||
HUDI-6762 | Remove usages of MetadataRecordsGenerationParams |
|
Open | Unassigned |
HUDI-1822
RFC-27 Multi Modal Indexing
false
HUDI-1822
RFC-27 Multi Modal Indexing