Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8409

STRINGs without stats have too low row-size in explain plan

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Impala 3.2.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Epic Color:
      ghx-label-6

      Description

      STRING columns without avg_size statistic are calculated into the row-size as 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in the memory if they are not empty). The issue is caused by adding -1 (meaning unknown) to the 12 byte slot size.

      I think that this doesn't cause problems, as the estimation is probably way off without statistics anyway, but row-size >= tuple size seems like a meaningful invariant that we shouldn't break.

      Reproduce:

      create table test_row_size (s string);
      explain select * from test_row_size; 
      Result:
      ...
      WARNING: The following tables are missing relevant table and/or column statistics.
      default.test_row_size
      ...
      00:SCAN HDFS [default.test_row_size]
         partitions=1/1 files=0 size=0B
         row-size=11B cardinality=0
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                csringhofer Csaba Ringhofer
                Reporter:
                csringhofer Csaba Ringhofer
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: