Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8409

STRINGs without stats have too low row-size in explain plan

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Impala 3.2.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Epic Color:
      ghx-label-6

      Description

      STRING columns without avg_size statistic are calculated into the row-size as 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in the memory if they are not empty). The issue is caused by adding -1 (meaning unknown) to the 12 byte slot size.

      I think that this doesn't cause problems, as the estimation is probably way off without statistics anyway, but row-size >= tuple size seems like a meaningful invariant that we shouldn't break.

      Reproduce:

      create table test_row_size (s string);
      explain select * from test_row_size; 
      Result:
      ...
      WARNING: The following tables are missing relevant table and/or column statistics.
      default.test_row_size
      ...
      00:SCAN HDFS [default.test_row_size]
         partitions=1/1 files=0 size=0B
         row-size=11B cardinality=0
      

        Attachments

          Activity

            People

            • Assignee:
              csringhofer Csaba Ringhofer
              Reporter:
              csringhofer Csaba Ringhofer
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: