Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8409

STRINGs without stats have too low row-size in explain plan

    XMLWordPrintableJSON

Details

    • ghx-label-6

    Description

      STRING columns without avg_size statistic are calculated into the row-size as 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in the memory if they are not empty). The issue is caused by adding -1 (meaning unknown) to the 12 byte slot size.

      I think that this doesn't cause problems, as the estimation is probably way off without statistics anyway, but row-size >= tuple size seems like a meaningful invariant that we shouldn't break.

      Reproduce:

      create table test_row_size (s string);
      explain select * from test_row_size; 
      Result:
      ...
      WARNING: The following tables are missing relevant table and/or column statistics.
      default.test_row_size
      ...
      00:SCAN HDFS [default.test_row_size]
         partitions=1/1 files=0 size=0B
         row-size=11B cardinality=0
      

      Attachments

        Issue Links

          Activity

            People

              csringhofer Csaba Ringhofer
              csringhofer Csaba Ringhofer
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: