Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.14.0
Description
currently if multiple partition table,would cause oom easy
eg:
CREATE TABLE hudi_test.tmp_hudi_test_1 (
id string,
name string,
dt bigint,
day STRING COMMENT '日期分区',
hour INT COMMENT '小时分区'
)using hudi
OPTIONS ('hoodie.datasource.write.hive_style_partitioning' 'false', 'hoodie.datasource.meta.sync.enable' 'false', 'hoodie.datasource.hive_sync.enable' 'false')
tblproperties (
'primaryKey' = 'id',
'type' = 'mor',
'preCombineField'='dt',
'hoodie.index.type' = 'BUCKET',
'hoodie.bucket.index.hash.field' = 'id',
'hoodie.bucket.index.num.buckets'=512
)
PARTITIONED BY (day,hour);
select count(1) from hudi_test.tmp_hudi_test_1 where day='2023-10-17' would list much filestatus to driver,and driver would oom(such as table with hundreds billion records in a partition(day='2023-10-17'))
commit is 7c79ebee1ff1c9a0f5117252cb12fa2f03ac4b24 from master
Attachments
Attachments
Issue Links
- links to