Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1570

Add Avg record size in commit metadata

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Utilities
    • None

    Description

      Many users want to understand what would be their avg record size in hudi storage. They need this so that they can deduce their bloom config values. 

      As of now, there is no easy way to fetch record size for the end user. Even w/ hudi-cli, we could decipher from commit metadata, but we need to make some rough calculation. So, it would be better if we store the avg record size w/ WriteStats (total bytes written/ total records written) , as well as in commit metadata. So, in hudi_cli, we could expose this info along w/ "commit showpartitions" or expose another command "commit showmetadata" or something. 

      As of now, we could calculate the avg size from bytes written/records written from commit metadata. 

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shivnarayan sivabalan narayanan
            shivnarayan sivabalan narayanan

            Dates

              Created:
              Updated:

              Issue deployment