Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1
-
None
-
ghx-label-10
Description
IMPALA-8205 adds two required fields for TIntermediateColumnStats:
struct TIntermediateColumnStats { // One byte for each bucket of the NDV HLL computation 1: optional binary intermediate_ndv // If true, intermediate_ndv is RLE-compressed 2: optional bool is_ndv_encoded // Number of nulls seen so far (or -1 if nulls are not counted) 3: optional i64 num_nulls // The maximum width, in bytes, of the column 4: optional i32 max_width // The average width (in bytes) of the column 5: optional double avg_width // The number of rows counted, needed to compute NDVs from intermediate_ndv 6: optional i64 num_rows + + // The number of true and false value, of the column + 7: required i64 num_trues + 8: required i64 num_falses }
TIntermediateColumnStats is the representation of incremental stats which are stored in HMS partition properties using keys like "impala_intermediate_stats_num_chunks" and "impala_intermediate_stats_chunk0", "impala_intermediate_stats_chunk1", "impala_intermediate_stats_chunk2", etc.
While upgrading Impala to 4.0, incremental stats can't be parsed due to missing these fields.
W0227 09:06:49.057451 31105 HdfsPartition.java:1337] Failed to set partition stats for table reptest.test partition loaddate=2022 Java exception follows: org.apache.impala.common.InternalException: Required field 'num_trues' was not found in serialized data! Struct: org.apache.impala.thrift.TIntermediateColumnStats$TIntermediateColumnStatsStandardScheme@377da96a at org.apache.impala.common.JniUtil.deserializeThrift(JniUtil.java:138) at org.apache.impala.catalog.PartitionStatsUtil.partStatsBytesFromParameters(PartitionStatsUtil.java:114) at org.apache.impala.catalog.HdfsPartition$Builder.extractAndCompressPartStats(HdfsPartition.java:1334) at org.apache.impala.catalog.HdfsPartition$Builder.setMsPartition(HdfsPartition.java:1310) at org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder(HdfsTable.java:906) at org.apache.impala.catalog.HdfsTable.createPartitionBuilder(HdfsTable.java:895) at org.apache.impala.catalog.HdfsTable.loadAllPartitions(HdfsTable.java:698) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1244) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1138) at org.apache.impala.catalog.TableLoader.load(TableLoader.java:114) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
numTrues and numFalses are not used in planning. We'd better change them to optional to unblock the migration.
Attachments
Issue Links
- relates to
-
IMPALA-8205 Illegal statistics for numFalse and numTrue
- Resolved