Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14265

Partial stats in Join operator may lead to data size estimate of 0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.1.1, 2.2.0
    • Statistics
    • None

    Description

      For some tables, we might not have the column stats available. However, if the table is partitioned, we will have the stats for partition columns.

      When we estimate the size of the data produced by a join operator, we end up using only the columns that are available for the calculation e.g. partition columns in this case.

      However, even in these cases, we should add the data size for those columns for which we do not have stats (default size for the column type x estimated number of rows).

      To reproduce, the following example can be used:

      create table sample_partitioned (x int) partitioned by (y int);
      insert into sample_partitioned partition(y=1) values (1),(2);
      create temporary table sample as select * from sample_partitioned;
      analyze table sample compute statistics for columns;
      
      explain select sample_partitioned.x from sample_partitioned, sample where sample.y = sample_partitioned.y;
      

      Attachments

        1. HIVE-14265.patch
          9 kB
          jcamachorodriguez

        Activity

          People

            jcamacho Jesús Camacho Rodríguez
            ndembla Nita Dembla
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: