Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.1.1, 3.0.0
-
None
-
None
Description
org.apache.hadoop.hive.ql.stats.StatsUtils.java
/** * Based on the provided column statistics and number of rows, this method infers if the column * can be primary key. It checks if the difference between the min and max value is equal to * number of rows specified. * @param numRows - number of rows * @param colStats - column statistics */ public static void inferAndSetPrimaryKey(long numRows, List<ColStatistics> colStats) { if (colStats != null) { for (ColStatistics cs : colStats) { if (cs != null && cs.getCountDistint() >= numRows) { cs.setPrimaryKey(true); } else if (cs != null && cs.getRange() != null && cs.getRange().minValue != null && cs.getRange().maxValue != null) { if (numRows == ((cs.getRange().maxValue.longValue() - cs.getRange().minValue.longValue()) + 1)) { cs.setPrimaryKey(true); } } } } }
This code is likely to miss many PK key scenarios because users may delete rows from their tables over time and cause this to miss.
PK Values: 1,2,4 Range = ( 3 +1 ) = 4 Rows = 3
Allow a query hint that can be used by the user to specify a join as a PK-FK relationship.