Details
-
Improvement
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
Description
Looks like there are data type mis-matches b/w base files and log files while we generate col stats. So, when we try to merge them together, we are running into issues.
java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.time.chrono.ChronoLocalDate (java.lang.Integer and java.time.chrono.ChronoLocalDate are in module java.base of loader 'bootstrap')
ref patch: https://github.com/apache/hudi/pull/12331
For eg, for "current_date" column,
date type from parquet:
required int32 current_date (DATE)
in log files, data type is
{"type":"int","logicalType":"date"}
For now, lets support partition stats only for scalar/primitives types. and for other datatypes, we can skip generate stats into partition stats.
We can ensure user experience is good and seamless and not see random errors. Even at the cost of not indexing only.
Attachments
Issue Links
- links to