Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6540

Support Multi Column Stats

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      For Joins involving compound predicates, multi column stats can be used to accurately compute the NDV.

      Objective is to compute NDV of more than one columns.

      Compute NDV of (x,y,z).

      R1 IJ R2 on R1.x=R2.x and R1.y=R2.y and R1.z=R2.z can use max(NDV(R1.x, R1.y, R1.z), NDV(R2.x, R2.y, R2.z)) for Join NDV (& hence selectivity).

      http://www.oracle-base.com/articles/11g/statistics-collection-enhancements-11gr1.php#multi_column_statistics
      http://blogs.msdn.com/b/ianjo/archive/2005/11/10/491548.aspx
      http://developer.teradata.com/database/articles/removing-multi-column-statistics-a-process-for-identification-of-redundant-statist

      Attachments

        Activity

          People

            jpullokkaran Laljo John Pullokkaran
            jpullokkaran Laljo John Pullokkaran
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: