Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
master : 4819ea44723b87a68406d248782861cf6e5d3305
Description
Here is the ddl for the dataset:
create dataset ds_tweet(typeTweet) if not exists primary key id using compaction policy prefix (("max-mergable-component-size"="134217728"),("max-tolerance-component-count"="10")) with filter on create_at ; create index text_idx if not exists on ds_tweet("text") type keyword;
In this case, I want to create a smaller component around 128M. During the data ingestion phase, it works well, and the size of each text_idx component is also small (~80M each). I assume it also followed the component size constraint?
After ingestion, I found that I needed to build another index,
create index time_idx if not exists on ds_tweet(create_at) type btree;
When it finished, I found that this time_idx didn't follow the constraint and ended up with one giant 1.2G component on each partition.