Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16811

Estimate statistics in absence of stats

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0
    • Component/s: None
    • Labels:

      Description

      Currently Join ordering completely bails out in absence of statistics and this could lead to bad joins such as cross joins.
      e.g. following select query will produce cross join.

      create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, S_NATIONKEY INT, 
      S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
      
      CREATE TABLE lineitem (L_ORDERKEY      INT,
                                      L_PARTKEY       INT,
                                      L_SUPPKEY       INT,
                                      L_LINENUMBER    INT,
                                      L_QUANTITY      DOUBLE,
                                      L_EXTENDEDPRICE DOUBLE,
                                      L_DISCOUNT      DOUBLE,
                                      L_TAX           DOUBLE,
                                      L_RETURNFLAG    STRING,
                                      L_LINESTATUS    STRING,
                                      l_shipdate      STRING,
                                      L_COMMITDATE    STRING,
                                      L_RECEIPTDATE   STRING,
                                      L_SHIPINSTRUCT  STRING,
                                      L_SHIPMODE      STRING,
                                      L_COMMENT       STRING) partitioned by (dl int)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY '|';
      
      
      CREATE TABLE part(
          p_partkey INT,
          p_name STRING,
          p_mfgr STRING,
          p_brand STRING,
          p_type STRING,
          p_size INT,
          p_container STRING,
          p_retailprice DOUBLE,
          p_comment STRING
      );
      
      explain select count(1) from part,supplier,lineitem where p_partkey = l_partkey and s_suppkey = l_suppkey;
      
      

      Estimating stats will prevent join ordering algorithm to bail out and come up with join at least better than cross join

        Attachments

        1. HIVE-16811.1.patch
          30 kB
          Vineet Garg
        2. HIVE-16811.10.patch
          6.11 MB
          Vineet Garg
        3. HIVE-16811.11.patch
          6.10 MB
          Vineet Garg
        4. HIVE-16811.12.patch
          6.11 MB
          Vineet Garg
        5. HIVE-16811.13.patch
          6.29 MB
          Vineet Garg
        6. HIVE-16811.14.patch
          6.35 MB
          Vineet Garg
        7. HIVE-16811.15.patch
          6.37 MB
          Vineet Garg
        8. HIVE-16811.16.patch
          6.39 MB
          Vineet Garg
        9. HIVE-16811.17.patch
          6.39 MB
          Vineet Garg
        10. HIVE-16811.18.patch
          6.39 MB
          Vineet Garg
        11. HIVE-16811.19.patch
          6.39 MB
          Vineet Garg
        12. HIVE-16811.2.patch
          2.20 MB
          Vineet Garg
        13. HIVE-16811.3.patch
          2.20 MB
          Vineet Garg
        14. HIVE-16811.4.patch
          7.83 MB
          Vineet Garg
        15. HIVE-16811.5.patch
          6.83 MB
          Vineet Garg
        16. HIVE-16811.6.patch
          6.91 MB
          Vineet Garg
        17. HIVE-16811.7.patch
          6.91 MB
          Vineet Garg
        18. HIVE-16811.8.patch
          6.11 MB
          Vineet Garg
        19. HIVE-16811.9.patch
          6.11 MB
          Vineet Garg

          Activity

            People

            • Assignee:
              vgarg Vineet Garg
              Reporter:
              vgarg Vineet Garg
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: