Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-8555

Add support for nested field col stats generation for log files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • None
    • 1.0.0
    • metadata
    • None

    Description

      Out of the box, we generate col stats only for top level fields. but user does have an option to overide the columns for which they need hudi to generate cols stats for.

       

      When we tested for a nested field, we realized that we have a gap here. Hudi does generate col stats for base files properly even for nested fields. but log files are missing to generate col stats. 

      https://github.com/apache/hudi/blob/fa5878d9c46f5c824ae56a9ad56ef90b0bc37a19/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java#L443 

      The linked code snippet will only honor top level fields. 

       

      So, we have two fixes here. 

      Fix1: lets avoid generating stats even for base files. also throw exception if someone explicitly sets a nested field with "hoodie.metadata.index.column.stats.column.list". 
      Fix2: Follow up to support nested field col stats generation. 

       

      Fix1 is a blocker for 1.0 release. May be we can punt fix 2 for later. 

      Attachments

        Activity

          People

            shivnarayan sivabalan narayanan
            shivnarayan sivabalan narayanan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: