Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1570

Add Avg record size in commit metadata

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Utilities
    • None

    Description

      Many users want to understand what would be their avg record size in hudi storage. They need this so that they can deduce their bloom config values. 

      As of now, there is no easy way to fetch record size for the end user. Even w/ hudi-cli, we could decipher from commit metadata, but we need to make some rough calculation. So, it would be better if we store the avg record size w/ WriteStats (total bytes written/ total records written) , as well as in commit metadata. So, in hudi_cli, we could expose this info along w/ "commit showpartitions" or expose another command "commit showmetadata" or something. 

      As of now, we could calculate the avg size from bytes written/records written from commit metadata. 

       

       

      Attachments

        1. Screen Shot 2021-01-31 at 7.05.55 PM.png
          90 kB
          sivabalan narayanan

        Activity

          People

            shivnarayan sivabalan narayanan
            shivnarayan sivabalan narayanan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: