Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-8555

Add support for nested field col stats generation for log files

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • None
    • 1.0.0
    • metadata
    • None

    Description

      Out of the box, we generate col stats only for top level fields. but user does have an option to overide the columns for which they need hudi to generate cols stats for.

       

      When we tested for a nested field, we realized that we have a gap here. Hudi does generate col stats for base files properly even for nested fields. but log files are missing to generate col stats. 

      https://github.com/apache/hudi/blob/fa5878d9c46f5c824ae56a9ad56ef90b0bc37a19/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java#L443 

      The linked code snippet will only honor top level fields. 

       

      So, we have two fixes here. 

      Fix1: lets avoid generating stats even for base files. also throw exception if someone explicitly sets a nested field with "hoodie.metadata.index.column.stats.column.list". 
      Fix2: Follow up to support nested field col stats generation. 

       

      Fix1 is a blocker for 1.0 release. May be we can punt fix 2 for later. 

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shivnarayan sivabalan narayanan Assign to me
            shivnarayan sivabalan narayanan

            Dates

              Created:
              Updated:

              Agile

                Active Sprint:
                Hudi 1.0 Blockers+Bugs Sprint ends 19/Nov/24
                View on Board

                Slack

                  Issue deployment