Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1570

Add Avg record size in commit metadata

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Do
    • None
    • None
    • Utilities
    • 3

    Description

      Many users want to understand what would be their avg record size in hudi storage. They need this so that they can deduce their bloom config values. 

      As of now, there is no easy way to fetch record size for the end user. Even w/ hudi-cli, we could decipher from commit metadata, but we need to make some rough calculation. So, it would be better if we store the avg record size w/ WriteStats (total bytes written/ total records written) , as well as in commit metadata. So, in hudi_cli, we could expose this info along w/ "commit showpartitions" or expose another command "commit showmetadata" or something. 

      As of now, we could calculate the avg size from bytes written/records written from commit metadata. 

       

       

      Attachments

        1. Screen Shot 2021-01-31 at 7.05.55 PM.png
          90 kB
          sivabalan narayanan

        Issue Links

          Activity

            People

              jonvex Jonathan Vexler
              shivnarayan sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 2h
                  2h
                  Remaining:
                  Remaining Estimate - 2h
                  2h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified