Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15339

Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 2.2.0
    • None
    • None
    • Reviewed

    Description

      Based on query pattern, FilterSelectivityEstimator gets column statistics from metastore in multiple calls. For instance, in the following query, it ends up getting individual column statistics for for flights multiple number of times.

      When the table has large number of partitions, getting statistics for columns via multiple calls can be very expensive. This would adversely impact the overall compilation time. The following query took 14 seconds to compile.

      SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
      YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
      FROM `flights` as `flights`
      JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
      JOIN `airports` as `source_airport` ON (`flights`.`origin` = `source_airport`.`iata`)
      JOIN `airports` as `dest_airport` ON (`flights`.`dest` = `dest_airport`.`iata`)
      GROUP BY YEAR(`flights`.`dateofflight`);
      

      It may be helpful to club all columns that need statistics and fetch these details in single remote call.

      Attachments

        1. HIVE-15339.6.patch
          7 kB
          Rajesh Balamohan
        2. HIVE-15339.5.patch
          6 kB
          Rajesh Balamohan
        3. HIVE-15339.4.patch
          7 kB
          Rajesh Balamohan
        4. HIVE-15339.3.patch
          2 kB
          Rajesh Balamohan
        5. HIVE-15339.1.patch
          1 kB
          Rajesh Balamohan

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: