Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11953

num_trues and num_falses in TIntermediateColumnStats should be optional

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1
    • Impala 4.1.2, Impala 4.3.0
    • Frontend
    • None

    Description

      IMPALA-8205 adds two required fields for TIntermediateColumnStats:

      struct TIntermediateColumnStats {
         // One byte for each bucket of the NDV HLL computation
        1: optional binary intermediate_ndv
      
        // If true, intermediate_ndv is RLE-compressed
        2: optional bool is_ndv_encoded
      
        // Number of nulls seen so far (or -1 if nulls are not counted)
        3: optional i64 num_nulls
      
        // The maximum width, in bytes, of the column
        4: optional i32 max_width
      
        // The average width (in bytes) of the column
        5: optional double avg_width
      
        // The number of rows counted, needed to compute NDVs from intermediate_ndv
        6: optional i64 num_rows
      +
      +  // The number of true and false value, of the column
      +  7: required i64 num_trues
      +  8: required i64 num_falses
       }

      TIntermediateColumnStats is the representation of incremental stats which are stored in HMS partition properties using keys like "impala_intermediate_stats_num_chunks" and "impala_intermediate_stats_chunk0", "impala_intermediate_stats_chunk1", "impala_intermediate_stats_chunk2", etc.

      While upgrading Impala to 4.0, incremental stats can't be parsed due to missing these fields.

      W0227 09:06:49.057451 31105 HdfsPartition.java:1337] Failed to set partition stats for table reptest.test partition loaddate=2022
      Java exception follows:
      org.apache.impala.common.InternalException: Required field 'num_trues' was not found in serialized data! Struct: org.apache.impala.thrift.TIntermediateColumnStats$TIntermediateColumnStatsStandardScheme@377da96a
      	at org.apache.impala.common.JniUtil.deserializeThrift(JniUtil.java:138)
      	at org.apache.impala.catalog.PartitionStatsUtil.partStatsBytesFromParameters(PartitionStatsUtil.java:114)
      	at org.apache.impala.catalog.HdfsPartition$Builder.extractAndCompressPartStats(HdfsPartition.java:1334)
      	at org.apache.impala.catalog.HdfsPartition$Builder.setMsPartition(HdfsPartition.java:1310)
      	at org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder(HdfsTable.java:906)
      	at org.apache.impala.catalog.HdfsTable.createPartitionBuilder(HdfsTable.java:895)
      	at org.apache.impala.catalog.HdfsTable.loadAllPartitions(HdfsTable.java:698)
      	at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1244)
      	at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1138)
      	at org.apache.impala.catalog.TableLoader.load(TableLoader.java:114)
      	at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245)
      	at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)

      numTrues and numFalses are not used in planning. We'd better change them to optional to unblock the migration.

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: