Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16868

Query Hint For Primary Key / Foreign Key Joins

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.1.1, 3.0.0
    • None
    • Physical Optimizer
    • None

    Description

      org.apache.hadoop.hive.ql.stats.StatsUtils.java
        /**
         * Based on the provided column statistics and number of rows, this method infers if the column
         * can be primary key. It checks if the difference between the min and max value is equal to
         * number of rows specified.
         * @param numRows - number of rows
         * @param colStats - column statistics
         */
        public static void inferAndSetPrimaryKey(long numRows, List<ColStatistics> colStats) {
          if (colStats != null) {
            for (ColStatistics cs : colStats) {
              if (cs != null && cs.getCountDistint() >= numRows) {
                cs.setPrimaryKey(true);
              }
              else if (cs != null && cs.getRange() != null && cs.getRange().minValue != null &&
                  cs.getRange().maxValue != null) {
                if (numRows ==
                    ((cs.getRange().maxValue.longValue() - cs.getRange().minValue.longValue()) + 1)) {
                  cs.setPrimaryKey(true);
                }
              }
            }
          }
        }
      

      This code is likely to miss many PK key scenarios because users may delete rows from their tables over time and cause this to miss.

      PK Values: 1,2,4
      Range = ( 3 +1 ) = 4
      Rows = 3
      

      Allow a query hint that can be used by the user to specify a join as a PK-FK relationship.

      Attachments

        Activity

          People

            Unassigned Unassigned
            belugabehr David Mollitor
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: