Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-8586

Fix partition stats index to only index supported types

    XMLWordPrintableJSON

Details

    Description

      Looks like there are data type mis-matches b/w base files and log files while we generate col stats. So, when we try to merge them together, we are running into issues. 

      java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.time.chrono.ChronoLocalDate (java.lang.Integer and java.time.chrono.ChronoLocalDate are in module java.base of loader 'bootstrap') 

      ref patch: https://github.com/apache/hudi/pull/12331 

      For eg, for "current_date" column, 

      date type from parquet: 
      required int32 current_date (DATE)

       

      in log files, data type is 

      {"type":"int","logicalType":"date"}

       

      For now, lets support partition stats only for scalar/primitives types. and for other datatypes, we can skip generate stats into partition stats. 

      We can ensure user experience is good and seamless and not see random errors. Even at the cost of not indexing only. 

       

       

       

       

      Attachments

        Issue Links

          Activity

            People

              codope Sagar Sumit
              shivnarayan sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: