Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Do
-
None
-
None
-
3
Description
Many users want to understand what would be their avg record size in hudi storage. They need this so that they can deduce their bloom config values.
As of now, there is no easy way to fetch record size for the end user. Even w/ hudi-cli, we could decipher from commit metadata, but we need to make some rough calculation. So, it would be better if we store the avg record size w/ WriteStats (total bytes written/ total records written) , as well as in commit metadata. So, in hudi_cli, we could expose this info along w/ "commit showpartitions" or expose another command "commit showmetadata" or something.
As of now, we could calculate the avg size from bytes written/records written from commit metadata.
Attachments
Attachments
Issue Links
- links to