diff --git data/files/customer_address.txt data/files/customer_address.txt new file mode 100644 index 0000000..81e05af --- /dev/null +++ data/files/customer_address.txt @@ -0,0 +1,20 @@ +1|AAAAAAAABAAAAAAA|18|Jackson |Parkway|Suite 280|Fairfield|Maricopa County|AZ|86192|United States|-7|condo| +2|AAAAAAAACAAAAAAA|362|Washington 6th|RD|Suite 80|Fairview|Taos County|NM|85709|United States|-7|condo| +3|AAAAAAAADAAAAAAA|585|Dogwood Washington|Circle|Suite Q|Pleasant Valley|York County|PA|12477|United States|-5|single family| +4|AAAAAAAAEAAAAAAA|111|Smith |Wy|Suite A|Oak Ridge|Kit Carson County|CO|88371|United States|-7|condo| +5|AAAAAAAAFAAAAAAA|31|College |Blvd|Suite 180|Glendale|Barry County|MO|63951|United States|-6|single family| +6|AAAAAAAAGAAAAAAA|59|Williams Sixth|Parkway|Suite 100|Lakeview|Chelan County|WA|98579|United States|-8|single family| +7|AAAAAAAAHAAAAAAA||Hill 7th|Road|Suite U|Farmington|||39145|United States||| +8|AAAAAAAAIAAAAAAA|875|Lincoln |Ct.|Suite Y|Union|Bledsoe County|TN|38721|United States|-5|apartment| +9|AAAAAAAAJAAAAAAA|819|1st Laurel|Ave|Suite 70|New Hope|Perry County|AL|39431|United States|-6|condo| +10|AAAAAAAAKAAAAAAA|851|Woodland Poplar|ST|Suite Y|Martinsville|Haines Borough|AK|90419|United States|-9|condo| +11|AAAAAAAALAAAAAAA|189|13th 2nd|Street|Suite 470|Maple Grove|Madison County|MT|68252|United States|-7|single family| +12|AAAAAAAAMAAAAAAA|76|Ash 8th|Ct.|Suite O|Edgewood|Mifflin County|PA|10069|United States|-5|apartment| +13|AAAAAAAANAAAAAAA|424|Main Second|Ln|Suite 130|Greenville|Noxubee County|MS|51387|United States|-6|single family| +14|AAAAAAAAOAAAAAAA|923|Pine Oak|Dr.|Suite 100||Lipscomb County|TX|77752||-6|| +15|AAAAAAAAPAAAAAAA|314|Spring |Ct.|Suite B|Oakland|Washington County|OH|49843|United States|-5|apartment| +16|AAAAAAAAABAAAAAA|576|Adams Center|Street|Suite J|Valley View|Oldham County|TX|75124|United States|-6|condo| +17|AAAAAAAABBAAAAAA|801|Green |Dr.|Suite 0|Montpelier|Richland County|OH|48930|United States|-5|single family| +18|AAAAAAAACBAAAAAA|460|Maple Spruce|Court|Suite 480|Somerville|Potter County|SD|57783|United States|-7|condo| +19|AAAAAAAADBAAAAAA|611|Wilson |Way|Suite O|Oakdale|Tangipahoa Parish|LA|79584|United States|-6|apartment| +20|AAAAAAAAEBAAAAAA|675|Elm Wilson|Street|Suite I|Hopewell|Williams County|OH|40587|United States|-5|condo| diff --git data/files/store.txt data/files/store.txt new file mode 100644 index 0000000..078bafd --- /dev/null +++ data/files/store.txt @@ -0,0 +1,12 @@ +1|AAAAAAAABAAAAAAA|1997-03-13||2451189|ought|245|5250760|8AM-4PM|William Ward|2|Unknown|Enough high areas stop expectations. Elaborate, local is|Charles Bartley|1|Unknown|1|Unknown|767|Spring |Wy|Suite 250|Midway|Williamson County|TN|31904|United States|-5|0.03| +2|AAAAAAAACAAAAAAA|1997-03-13|2000-03-12||able|236|5285950|8AM-4PM|Scott Smith|8|Unknown|Parliamentary candidates wait then heavy, keen mil|David Lamontagne|1|Unknown|1|Unknown|255|Sycamore |Dr.|Suite 410|Midway|Williamson County|TN|31904|United States|-5|0.03| +3|AAAAAAAACAAAAAAA|2000-03-13|||able|236|7557959|8AM-4PM|Scott Smith|7|Unknown|Impossible, true arms can treat constant, complete w|David Lamontagne|1|Unknown|1|Unknown|877|Park Laurel|Road|Suite T|Midway|Williamson County|TN|31904|United States|-5|0.03| +4|AAAAAAAAEAAAAAAA|1997-03-13|1999-03-13|2451044|ese|218|9341467|8AM-4PM|Edwin Adams|4|Unknown|Events would achieve other, eastern hours. Mechanisms must not eat other, new org|Thomas Pollack|1|Unknown|1|Unknown|27|Lake |Ln|Suite 260|Midway|Williamson County|TN|31904|United States|-5|0.03| +5|AAAAAAAAEAAAAAAA|1999-03-14|2001-03-12|2450910|anti|288|9078805|8AM-4PM|Edwin Adams|8|Unknown|Events would achieve other, eastern hours. Mechanisms must not eat other, new org|Thomas Pollack|1|Unknown|1|Unknown|27|Lee 6th|Court|Suite 80|Fairview|Williamson County|TN|35709|United States|-5|0.03| +6|AAAAAAAAEAAAAAAA|2001-03-13|||cally|229|9026222|8AM-4PM|Edwin Adams|10|Unknown|Events would achieve other, eastern hours. Mechanisms must not eat other, new org|Thomas Pollack|1|Unknown|1|Unknown|220|6th |Lane|Suite 140|Midway|Williamson County|TN|31904|United States|-5|0.03| +7|AAAAAAAAHAAAAAAA|1997-03-13|||ation|297|8954883|8AM-4PM|David Thomas|9|Unknown|Architects coul|Thomas Benton|1|Unknown|1|Unknown|811|Lee |Circle|Suite T|Midway|Williamson County|TN|31904|United States|-5|0.01| +8|AAAAAAAAIAAAAAAA|1997-03-13|2000-03-12||eing|278|6995995|8AM-4PM|Brett Yates|2|Unknown|Various bars make most. Difficult levels introduce at a boots. Buildings welcome only never el|Dean Morrison|1|Unknown|1|Unknown|226|12th |Lane|Suite D|Fairview|Williamson County|TN|35709|United States|-5|0.08| +9|AAAAAAAAIAAAAAAA|2000-03-13|||eing|271|6995995|8AM-4PM|Brett Yates|2|Unknown|Formal, psychological pounds relate reasonable, young principles. Black, |Dean Morrison|1|Unknown|1|Unknown|226|Hill |Boulevard|Suite 190|Midway|Williamson County|TN|31904|United States|-5|0.08| +10|AAAAAAAAKAAAAAAA|1997-03-13|1999-03-13||bar|294|9294113|8AM-4PM|Raymond Jacobs|8|Unknown|Little expectations include yet forward meetings.|Michael Wilson|1|Unknown|1|Unknown|175|4th |Court|Suite C|Midway|Williamson County|TN|31904|United States|-5|0.06| +11|AAAAAAAAKAAAAAAA|1999-03-14|2001-03-12||ought|294|9294113|8AM-4PM|Raymond Jacobs|6|Unknown|Mysterious employe|Michael Wilson|1|Unknown|1|Unknown|175|Park Green|Court|Suite 160|Midway|Williamson County|TN|31904|United States|-5|0.11| +12|AAAAAAAAKAAAAAAA|2001-03-13|||ought|294|5219562|8AM-12AM|Robert Thompson|6|Unknown|Events develop i|Dustin Kelly|1|Unknown|1|Unknown|337|College |Boulevard|Suite 100|Fairview|Williamson County|TN|31904|United States|-5|0.01| diff --git ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java index a147a45..1feb1fd 100644 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java +++ ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java @@ -1021,11 +1021,17 @@ private long applyGBYRule(long numRows, long dvProd) { */ public static class JoinStatsRule extends DefaultStatsRule implements NodeProcessor { + private boolean pkfkInferred = false; + private long newNumRows = 0; + private List> parents; + private CommonJoinOperator jop; + private int numAttr = 1; + @Override public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, Object... nodeOutputs) throws SemanticException { - CommonJoinOperator jop = (CommonJoinOperator) nd; - List> parents = jop.getParentOperators(); + jop = (CommonJoinOperator) nd; + parents = jop.getParentOperators(); AnnotateStatsProcCtx aspCtx = (AnnotateStatsProcCtx) procCtx; HiveConf conf = aspCtx.getConf(); boolean allStatsAvail = true; @@ -1052,22 +1058,25 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, Statistics stats = new Statistics(); Map rowCountParents = new HashMap(); List distinctVals = Lists.newArrayList(); - - // 2 relations, multiple attributes - boolean multiAttr = false; - int numAttr = 1; int numParent = parents.size(); - Map joinedColStats = Maps.newHashMap(); Map> joinKeys = Maps.newHashMap(); List rowCounts = Lists.newArrayList(); + // detect if there are multiple attributes in join key + ReduceSinkOperator rsOp = (ReduceSinkOperator) jop.getParentOperators().get(0); + List keyExprs = rsOp.getConf().getKeyCols(); + numAttr = keyExprs.size(); + + // infer PK-FK relationship in single attribute join case + inferPKFKRelationship(); + // get the join keys from parent ReduceSink operators for (int pos = 0; pos < parents.size(); pos++) { ReduceSinkOperator parent = (ReduceSinkOperator) jop.getParentOperators().get(pos); Statistics parentStats = parent.getStatistics(); - List keyExprs = parent.getConf().getKeyCols(); + keyExprs = parent.getConf().getKeyCols(); // Parent RS may have column statistics from multiple parents. // Populate table alias to row count map, this will be used later to @@ -1082,12 +1091,6 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, } rowCounts.add(parentStats.getNumRows()); - // multi-attribute join key - if (keyExprs.size() > 1) { - multiAttr = true; - numAttr = keyExprs.size(); - } - // compute fully qualified join key column names. this name will be // used to quickly look-up for column statistics of join key. // TODO: expressions in join condition will be ignored. assign @@ -1110,7 +1113,7 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, // attribute join, else max(V(R,y1), V(S,y1)) * max(V(R,y2), V(S,y2)) // in case of multi-attribute join long denom = 1; - if (multiAttr) { + if (numAttr > 1) { List perAttrDVs = Lists.newArrayList(); for (int idx = 0; idx < numAttr; idx++) { for (Integer i : joinKeys.keySet()) { @@ -1149,9 +1152,7 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, } // Update NDV of joined columns to be min(V(R,y), V(S,y)) - if (multiAttr) { - updateJoinColumnsNDV(joinKeys, joinedColStats, numAttr); - } + updateJoinColumnsNDV(joinKeys, joinedColStats, numAttr); // column statistics from different sources are put together and rename // fully qualified column names based on output schema of join operator @@ -1181,10 +1182,9 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, // update join statistics stats.setColumnStats(outColStats); - long newRowCount = computeNewRowCount(rowCounts, denom); + long newRowCount = pkfkInferred ? newNumRows : computeNewRowCount(rowCounts, denom); - updateStatsForJoinType(stats, newRowCount, jop, rowCountParents, - outInTabAlias); + updateStatsForJoinType(stats, newRowCount, jop, rowCountParents,outInTabAlias); jop.setStatistics(stats); if (isDebugEnabled) { @@ -1229,6 +1229,146 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, return null; } + private void inferPKFKRelationship() { + if (numAttr == 1) { + List parentsWithPK = getPrimaryKeyCandidates(parents); + + // in case of fact to many dimensional tables join, the join key in fact table will be + // mostly foreign key which will have corresponding primary key in dimension table. + // The selectivity of fact table in that case will be product of all selectivities of + // dimension tables (assumes conjunctivity) + for (Integer id : parentsWithPK) { + ColStatistics csPK = null; + Operator parent = parents.get(id); + for (ColStatistics cs : parent.getStatistics().getColumnStats()) { + if (cs.isPrimaryKey()) { + csPK = cs; + break; + } + } + + // infer foreign key candidates positions + List parentsWithFK = getForeignKeyCandidates(parents, csPK); + if (parentsWithFK.size() == 1 && + parentsWithFK.size() + parentsWithPK.size() == parents.size()) { + Operator parentWithFK = parents.get(parentsWithFK.get(0)); + List parentsSel = getSelectivity(parents, parentsWithPK); + Float prodSelectivity = 1.0f; + for (Float selectivity : parentsSel) { + prodSelectivity *= selectivity; + } + newNumRows = (long) (parentWithFK.getStatistics().getNumRows() * prodSelectivity); + pkfkInferred = true; + + // some debug information + if (isDebugEnabled) { + List parentIds = Lists.newArrayList(); + + // print primary key containing parents + for (Integer i : parentsWithPK) { + parentIds.add(parents.get(i).toString()); + } + LOG.debug("STATS-" + jop.toString() + ": PK parent id(s) - " + parentIds); + parentIds.clear(); + + // print foreign key containing parents + for (Integer i : parentsWithFK) { + parentIds.add(parents.get(i).toString()); + } + LOG.debug("STATS-" + jop.toString() + ": FK parent id(s) - " + parentIds); + } + } + } + } + } + + /** + * Get selectivity of reduce sink operators. + * @param ops - reduce sink operators + * @param opsWithPK - reduce sink operators with primary keys + * @return - list of selectivity for primary key containing operators + */ + private List getSelectivity(List> ops, + List opsWithPK) { + List result = Lists.newArrayList(); + for (Integer idx : opsWithPK) { + Operator op = ops.get(idx); + TableScanOperator tsOp = OperatorUtils + .findSingleOperatorUpstream(op, TableScanOperator.class); + long inputRow = tsOp.getStatistics().getNumRows(); + long outputRow = op.getStatistics().getNumRows(); + result.add((float) outputRow / (float) inputRow); + } + return result; + } + + /** + * Returns the index of parents whose join key column statistics ranges are within the specified + * primary key range (inferred as foreign keys). + * @param ops - operators + * @param csPK - column statistics of primary key + * @return - list of foreign key containing parent ids + */ + private List getForeignKeyCandidates(List> ops, + ColStatistics csPK) { + List result = Lists.newArrayList(); + if (csPK == null || ops == null) { + return result; + } + + for (int i = 0; i < ops.size(); i++) { + Operator op = ops.get(i); + if (op != null && op instanceof ReduceSinkOperator) { + ReduceSinkOperator rsOp = (ReduceSinkOperator) op; + List keys = rsOp.getConf().getKeyCols(); + List fqCols = StatsUtils.getFullQualifedColNameFromExprs(keys, + rsOp.getColumnExprMap()); + if (fqCols.size() == 1) { + String joinCol = fqCols.get(0); + if (rsOp.getStatistics() != null) { + ColStatistics cs = rsOp.getStatistics().getColumnStatisticsFromFQColName(joinCol); + if (cs != null && !cs.isPrimaryKey()) { + if (StatsUtils.inferForeignKey(csPK, cs)) { + result.add(i); + } + } + } + } + } + } + return result; + } + + /** + * Returns the index of parents whose join key columns are infer as primary keys + * @param ops - operators + * @return - list of primary key containing parent ids + */ + private List getPrimaryKeyCandidates(List> ops) { + List result = Lists.newArrayList(); + if (ops != null || !ops.isEmpty()) { + for (int i = 0; i < ops.size(); i++) { + Operator op = ops.get(i); + if (op instanceof ReduceSinkOperator) { + ReduceSinkOperator rsOp = (ReduceSinkOperator) op; + List keys = rsOp.getConf().getKeyCols(); + List fqCols = StatsUtils.getFullQualifedColNameFromExprs(keys, + rsOp.getColumnExprMap()); + if (fqCols.size() == 1) { + String joinCol = fqCols.get(0); + if (rsOp.getStatistics() != null) { + ColStatistics cs = rsOp.getStatistics().getColumnStatisticsFromFQColName(joinCol); + if (cs != null && cs.isPrimaryKey()) { + result.add(i); + } + } + } + } + } + } + return result; + } + private Long getEasedOutDenominator(List distinctVals) { // Exponential back-off for NDVs. // 1) Descending order sort of NDVs diff --git ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java index d1b7dea..c420190 100644 --- ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java +++ ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java @@ -33,12 +33,14 @@ private long numTrues; private long numFalses; private Range range; + private boolean isPrimaryKey; public ColStatistics(String tabAlias, String colName, String colType) { this.setTableAlias(tabAlias); this.setColumnName(colName); this.setColumnType(colType); this.setFullyQualifiedColName(StatsUtils.getFullyQualifiedColumnName(tabAlias, colName)); + this.setPrimaryKey(false); } public ColStatistics() { @@ -150,6 +152,12 @@ public String toString() { sb.append(numTrues); sb.append(" numFalses: "); sb.append(numFalses); + if (range != null) { + sb.append(" "); + sb.append(range); + } + sb.append(" isPrimaryKey: "); + sb.append(isPrimaryKey); return sb.toString(); } @@ -162,24 +170,47 @@ public ColStatistics clone() throws CloneNotSupportedException { clone.setNumNulls(numNulls); clone.setNumTrues(numTrues); clone.setNumFalses(numFalses); + clone.setPrimaryKey(isPrimaryKey); if (range != null ) { clone.setRange(range.clone()); } return clone; } + public boolean isPrimaryKey() { + return isPrimaryKey; + } + + public void setPrimaryKey(boolean isPrimaryKey) { + this.isPrimaryKey = isPrimaryKey; + } + public static class Range { public final Number minValue; public final Number maxValue; + Range(Number minValue, Number maxValue) { super(); this.minValue = minValue; this.maxValue = maxValue; } + @Override public Range clone() { return new Range(minValue, maxValue); } + + @Override + public String toString() { + StringBuilder sb = new StringBuilder(); + sb.append("Range: ["); + sb.append(" min: "); + sb.append(minValue); + sb.append(" max: "); + sb.append(maxValue); + sb.append(" ]"); + return sb.toString(); + } } } diff --git ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java index eb46e32..d42ede4 100644 --- ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java +++ ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java @@ -177,6 +177,9 @@ public static Statistics collectStatistics(HiveConf conf, PrunedPartitionList pa colStats = getTableColumnStats(table, schema, neededColumns); } + // infer if any column can be primary key based on column statistics + inferAndSetPrimaryKey(stats.getNumRows(), colStats); + stats.setColumnStatsState(deriveStatType(colStats, neededColumns)); stats.addToColumnStats(colStats); } else if (partList != null) { @@ -263,6 +266,9 @@ public static Statistics collectStatistics(HiveConf conf, PrunedPartitionList pa addParitionColumnStats(neededColumns, referencedColumns, schema, table, partList, columnStats); + // infer if any column can be primary key based on column statistics + inferAndSetPrimaryKey(stats.getNumRows(), columnStats); + stats.addToColumnStats(columnStats); State colState = deriveStatType(columnStats, referencedColumns); if (aggrStats.getPartsFound() != partNames.size() && colState != State.NONE) { @@ -277,6 +283,58 @@ public static Statistics collectStatistics(HiveConf conf, PrunedPartitionList pa return stats; } + + /** + * Based on the provided column statistics and number of rows, this method infers if the column + * can be primary key. It checks if the difference between the min and max value is equal to + * number of rows specified. + * @param numRows - number of rows + * @param colStats - column statistics + */ + public static void inferAndSetPrimaryKey(long numRows, List colStats) { + if (colStats != null) { + for (ColStatistics cs : colStats) { + if (cs != null && cs.getRange() != null && cs.getRange().minValue != null && + cs.getRange().maxValue != null) { + if (numRows == + ((cs.getRange().maxValue.longValue() - cs.getRange().minValue.longValue()) + 1)) { + cs.setPrimaryKey(true); + } + } + } + } + } + + /** + * Infer foreign key relationship from given column statistics. + * @param csPK - column statistics of primary key + * @param csFK - column statistics of potential foreign key + * @return + */ + public static boolean inferForeignKey(ColStatistics csPK, ColStatistics csFK) { + if (csPK != null && csFK != null) { + if (csPK.isPrimaryKey()) { + if (csPK.getRange() != null && csFK.getRange() != null) { + ColStatistics.Range pkRange = csPK.getRange(); + ColStatistics.Range fkRange = csFK.getRange(); + return isWithin(fkRange, pkRange); + } + } + } + return false; + } + + private static boolean isWithin(ColStatistics.Range range1, ColStatistics.Range range2) { + if (range1.minValue != null && range2.minValue != null && range1.maxValue != null && + range2.maxValue != null) { + if (range1.minValue.longValue() >= range2.minValue.longValue() && + range1.maxValue.longValue() <= range2.maxValue.longValue()) { + return true; + } + } + return false; + } + private static void addParitionColumnStats(List neededColumns, List referencedColumns, List schema, Table table, PrunedPartitionList partList, List colStats) @@ -533,6 +591,7 @@ public static ColStatistics getColStatistics(ColumnStatisticsObj cso, String tab // Columns statistics for complex datatypes are not supported yet return null; } + return cs; } diff --git ql/src/test/queries/clientpositive/annotate_stats_join_pkfk.q ql/src/test/queries/clientpositive/annotate_stats_join_pkfk.q new file mode 100644 index 0000000..aa62c60 --- /dev/null +++ ql/src/test/queries/clientpositive/annotate_stats_join_pkfk.q @@ -0,0 +1,123 @@ +set hive.stats.fetch.column.stats=true; + +drop table store_sales; +drop table store; +drop table customer_address; + +-- s_store_sk is PK, ss_store_sk is FK +-- ca_address_sk is PK, ss_addr_sk is FK + +create table store_sales +( + ss_sold_date_sk int, + ss_sold_time_sk int, + ss_item_sk int, + ss_customer_sk int, + ss_cdemo_sk int, + ss_hdemo_sk int, + ss_addr_sk int, + ss_store_sk int, + ss_promo_sk int, + ss_ticket_number int, + ss_quantity int, + ss_wholesale_cost float, + ss_list_price float, + ss_sales_price float, + ss_ext_discount_amt float, + ss_ext_sales_price float, + ss_ext_wholesale_cost float, + ss_ext_list_price float, + ss_ext_tax float, + ss_coupon_amt float, + ss_net_paid float, + ss_net_paid_inc_tax float, + ss_net_profit float +) +row format delimited fields terminated by '|'; + +create table store +( + s_store_sk int, + s_store_id string, + s_rec_start_date string, + s_rec_end_date string, + s_closed_date_sk int, + s_store_name string, + s_number_employees int, + s_floor_space int, + s_hours string, + s_manager string, + s_market_id int, + s_geography_class string, + s_market_desc string, + s_market_manager string, + s_division_id int, + s_division_name string, + s_company_id int, + s_company_name string, + s_street_number string, + s_street_name string, + s_street_type string, + s_suite_number string, + s_city string, + s_county string, + s_state string, + s_zip string, + s_country string, + s_gmt_offset float, + s_tax_precentage float +) +row format delimited fields terminated by '|'; + +create table customer_address +( + ca_address_sk int, + ca_address_id string, + ca_street_number string, + ca_street_name string, + ca_street_type string, + ca_suite_number string, + ca_city string, + ca_county string, + ca_state string, + ca_zip string, + ca_country string, + ca_gmt_offset float, + ca_location_type string +) +row format delimited fields terminated by '|'; + +load data local inpath '../../data/files/store.txt' overwrite into table store; +load data local inpath '../../data/files/store_sales.txt' overwrite into table store_sales; +load data local inpath '../../data/files/customer_address.txt' overwrite into table customer_address; + +analyze table store compute statistics; +analyze table store compute statistics for columns s_store_sk, s_floor_space; +analyze table store_sales compute statistics; +analyze table store_sales compute statistics for columns ss_store_sk, ss_addr_sk, ss_quantity; +analyze table customer_address compute statistics; +analyze table customer_address compute statistics for columns ca_address_sk; + +explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk); + +explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 0; + +explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_company_id > 0 and ss.ss_quantity > 10; + +explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 0; + +explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10; + +explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk); + +explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 1000; + +explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 1000; + +explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10; + +explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join customer_address ca on (ca.ca_address_sk = ss.ss_addr_sk); + +drop table store_sales; +drop table store; +drop table customer_address; diff --git ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out new file mode 100644 index 0000000..040dd4e --- /dev/null +++ ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out @@ -0,0 +1,987 @@ +PREHOOK: query: drop table store_sales +PREHOOK: type: DROPTABLE +POSTHOOK: query: drop table store_sales +POSTHOOK: type: DROPTABLE +PREHOOK: query: drop table store +PREHOOK: type: DROPTABLE +POSTHOOK: query: drop table store +POSTHOOK: type: DROPTABLE +PREHOOK: query: drop table customer_address +PREHOOK: type: DROPTABLE +POSTHOOK: query: drop table customer_address +POSTHOOK: type: DROPTABLE +PREHOOK: query: -- s_store_sk is PK, ss_store_sk is FK +-- ca_address_sk is PK, ss_addr_sk is FK + +create table store_sales +( + ss_sold_date_sk int, + ss_sold_time_sk int, + ss_item_sk int, + ss_customer_sk int, + ss_cdemo_sk int, + ss_hdemo_sk int, + ss_addr_sk int, + ss_store_sk int, + ss_promo_sk int, + ss_ticket_number int, + ss_quantity int, + ss_wholesale_cost float, + ss_list_price float, + ss_sales_price float, + ss_ext_discount_amt float, + ss_ext_sales_price float, + ss_ext_wholesale_cost float, + ss_ext_list_price float, + ss_ext_tax float, + ss_coupon_amt float, + ss_net_paid float, + ss_net_paid_inc_tax float, + ss_net_profit float +) +row format delimited fields terminated by '|' +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@store_sales +POSTHOOK: query: -- s_store_sk is PK, ss_store_sk is FK +-- ca_address_sk is PK, ss_addr_sk is FK + +create table store_sales +( + ss_sold_date_sk int, + ss_sold_time_sk int, + ss_item_sk int, + ss_customer_sk int, + ss_cdemo_sk int, + ss_hdemo_sk int, + ss_addr_sk int, + ss_store_sk int, + ss_promo_sk int, + ss_ticket_number int, + ss_quantity int, + ss_wholesale_cost float, + ss_list_price float, + ss_sales_price float, + ss_ext_discount_amt float, + ss_ext_sales_price float, + ss_ext_wholesale_cost float, + ss_ext_list_price float, + ss_ext_tax float, + ss_coupon_amt float, + ss_net_paid float, + ss_net_paid_inc_tax float, + ss_net_profit float +) +row format delimited fields terminated by '|' +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@store_sales +PREHOOK: query: create table store +( + s_store_sk int, + s_store_id string, + s_rec_start_date string, + s_rec_end_date string, + s_closed_date_sk int, + s_store_name string, + s_number_employees int, + s_floor_space int, + s_hours string, + s_manager string, + s_market_id int, + s_geography_class string, + s_market_desc string, + s_market_manager string, + s_division_id int, + s_division_name string, + s_company_id int, + s_company_name string, + s_street_number string, + s_street_name string, + s_street_type string, + s_suite_number string, + s_city string, + s_county string, + s_state string, + s_zip string, + s_country string, + s_gmt_offset float, + s_tax_precentage float +) +row format delimited fields terminated by '|' +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@store +POSTHOOK: query: create table store +( + s_store_sk int, + s_store_id string, + s_rec_start_date string, + s_rec_end_date string, + s_closed_date_sk int, + s_store_name string, + s_number_employees int, + s_floor_space int, + s_hours string, + s_manager string, + s_market_id int, + s_geography_class string, + s_market_desc string, + s_market_manager string, + s_division_id int, + s_division_name string, + s_company_id int, + s_company_name string, + s_street_number string, + s_street_name string, + s_street_type string, + s_suite_number string, + s_city string, + s_county string, + s_state string, + s_zip string, + s_country string, + s_gmt_offset float, + s_tax_precentage float +) +row format delimited fields terminated by '|' +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@store +PREHOOK: query: create table customer_address +( + ca_address_sk int, + ca_address_id string, + ca_street_number string, + ca_street_name string, + ca_street_type string, + ca_suite_number string, + ca_city string, + ca_county string, + ca_state string, + ca_zip string, + ca_country string, + ca_gmt_offset float, + ca_location_type string +) +row format delimited fields terminated by '|' +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@customer_address +POSTHOOK: query: create table customer_address +( + ca_address_sk int, + ca_address_id string, + ca_street_number string, + ca_street_name string, + ca_street_type string, + ca_suite_number string, + ca_city string, + ca_county string, + ca_state string, + ca_zip string, + ca_country string, + ca_gmt_offset float, + ca_location_type string +) +row format delimited fields terminated by '|' +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@customer_address +PREHOOK: query: load data local inpath '../../data/files/store.txt' overwrite into table store +PREHOOK: type: LOAD +#### A masked pattern was here #### +PREHOOK: Output: default@store +POSTHOOK: query: load data local inpath '../../data/files/store.txt' overwrite into table store +POSTHOOK: type: LOAD +#### A masked pattern was here #### +POSTHOOK: Output: default@store +PREHOOK: query: load data local inpath '../../data/files/store_sales.txt' overwrite into table store_sales +PREHOOK: type: LOAD +#### A masked pattern was here #### +PREHOOK: Output: default@store_sales +POSTHOOK: query: load data local inpath '../../data/files/store_sales.txt' overwrite into table store_sales +POSTHOOK: type: LOAD +#### A masked pattern was here #### +POSTHOOK: Output: default@store_sales +PREHOOK: query: load data local inpath '../../data/files/customer_address.txt' overwrite into table customer_address +PREHOOK: type: LOAD +#### A masked pattern was here #### +PREHOOK: Output: default@customer_address +POSTHOOK: query: load data local inpath '../../data/files/customer_address.txt' overwrite into table customer_address +POSTHOOK: type: LOAD +#### A masked pattern was here #### +POSTHOOK: Output: default@customer_address +PREHOOK: query: analyze table store compute statistics +PREHOOK: type: QUERY +PREHOOK: Input: default@store +PREHOOK: Output: default@store +POSTHOOK: query: analyze table store compute statistics +POSTHOOK: type: QUERY +POSTHOOK: Input: default@store +POSTHOOK: Output: default@store +PREHOOK: query: analyze table store compute statistics for columns s_store_sk, s_floor_space +PREHOOK: type: QUERY +PREHOOK: Input: default@store +#### A masked pattern was here #### +POSTHOOK: query: analyze table store compute statistics for columns s_store_sk, s_floor_space +POSTHOOK: type: QUERY +POSTHOOK: Input: default@store +#### A masked pattern was here #### +PREHOOK: query: analyze table store_sales compute statistics +PREHOOK: type: QUERY +PREHOOK: Input: default@store_sales +PREHOOK: Output: default@store_sales +POSTHOOK: query: analyze table store_sales compute statistics +POSTHOOK: type: QUERY +POSTHOOK: Input: default@store_sales +POSTHOOK: Output: default@store_sales +PREHOOK: query: analyze table store_sales compute statistics for columns ss_store_sk, ss_addr_sk, ss_quantity +PREHOOK: type: QUERY +PREHOOK: Input: default@store_sales +#### A masked pattern was here #### +POSTHOOK: query: analyze table store_sales compute statistics for columns ss_store_sk, ss_addr_sk, ss_quantity +POSTHOOK: type: QUERY +POSTHOOK: Input: default@store_sales +#### A masked pattern was here #### +PREHOOK: query: analyze table customer_address compute statistics +PREHOOK: type: QUERY +PREHOOK: Input: default@customer_address +PREHOOK: Output: default@customer_address +POSTHOOK: query: analyze table customer_address compute statistics +POSTHOOK: type: QUERY +POSTHOOK: Input: default@customer_address +POSTHOOK: Output: default@customer_address +PREHOOK: query: analyze table customer_address compute statistics for columns ca_address_sk +PREHOOK: type: QUERY +PREHOOK: Input: default@customer_address +#### A masked pattern was here #### +POSTHOOK: query: analyze table customer_address compute statistics for columns ca_address_sk +POSTHOOK: type: QUERY +POSTHOOK: Input: default@customer_address +#### A masked pattern was here #### +PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) +PREHOOK: type: QUERY +POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: s + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: s_store_sk is not null (type: boolean) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: ss + Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: ss_store_sk is not null (type: boolean) + Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ss_store_sk (type: int) + sort order: + + Map-reduce partition columns: ss_store_sk (type: int) + Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + condition expressions: + 0 {KEY.reducesinkkey0} + 1 + outputColumnNames: _col0 + Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator + expressions: _col0 (type: int) + outputColumnNames: _col0 + Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator + compressed: false + Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 0 +PREHOOK: type: QUERY +POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 0 +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: s + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (s_store_sk is not null and (s_store_sk > 0)) (type: boolean) + Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: ss + Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (ss_store_sk is not null and (ss_store_sk > 0)) (type: boolean) + Statistics: Num rows: 321 Data size: 1236 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ss_store_sk (type: int) + sort order: + + Map-reduce partition columns: ss_store_sk (type: int) + Statistics: Num rows: 321 Data size: 1236 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + condition expressions: + 0 {KEY.reducesinkkey0} + 1 + outputColumnNames: _col0 + Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator + expressions: _col0 (type: int) + outputColumnNames: _col0 + Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator + compressed: false + Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: COMPLETE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_company_id > 0 and ss.ss_quantity > 10 +PREHOOK: type: QUERY +POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_company_id > 0 and ss.ss_quantity > 10 +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: s + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: PARTIAL + Filter Operator + predicate: (s_store_sk is not null and (s_company_id > 0)) (type: boolean) + Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: PARTIAL + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: PARTIAL + TableScan + alias: ss + Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (ss_store_sk is not null and (ss_quantity > 10)) (type: boolean) + Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ss_store_sk (type: int) + sort order: + + Map-reduce partition columns: ss_store_sk (type: int) + Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + condition expressions: + 0 {KEY.reducesinkkey0} + 1 + outputColumnNames: _col0 + Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: PARTIAL + Select Operator + expressions: _col0 (type: int) + outputColumnNames: _col0 + Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: PARTIAL + File Output Operator + compressed: false + Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: PARTIAL + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 0 +PREHOOK: type: QUERY +POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 0 +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: s + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (s_store_sk is not null and (s_floor_space > 0)) (type: boolean) + Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: ss + Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: ss_store_sk is not null (type: boolean) + Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ss_store_sk (type: int) + sort order: + + Map-reduce partition columns: ss_store_sk (type: int) + Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + condition expressions: + 0 {KEY.reducesinkkey0} + 1 + outputColumnNames: _col0 + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator + expressions: _col0 (type: int) + outputColumnNames: _col0 + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator + compressed: false + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10 +PREHOOK: type: QUERY +POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10 +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: s + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: s_store_sk is not null (type: boolean) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: ss + Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (ss_store_sk is not null and (ss_quantity > 10)) (type: boolean) + Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ss_store_sk (type: int) + sort order: + + Map-reduce partition columns: ss_store_sk (type: int) + Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + condition expressions: + 0 {KEY.reducesinkkey0} + 1 + outputColumnNames: _col0 + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator + expressions: _col0 (type: int) + outputColumnNames: _col0 + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator + compressed: false + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) +PREHOOK: type: QUERY +POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: s1 + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: s_store_sk is not null (type: boolean) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: s + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: s_store_sk is not null (type: boolean) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: ss + Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: ss_store_sk is not null (type: boolean) + Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ss_store_sk (type: int) + sort order: + + Map-reduce partition columns: ss_store_sk (type: int) + Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + Inner Join 1 to 2 + condition expressions: + 0 {KEY.reducesinkkey0} + 1 + 2 + outputColumnNames: _col0 + Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator + expressions: _col0 (type: int) + outputColumnNames: _col0 + Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator + compressed: false + Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 1000 +PREHOOK: type: QUERY +POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 1000 +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: s1 + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (s_store_sk is not null and (s_store_sk > 1000)) (type: boolean) + Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: s + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (s_store_sk is not null and (s_store_sk > 1000)) (type: boolean) + Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: ss + Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (ss_store_sk is not null and (ss_store_sk > 1000)) (type: boolean) + Statistics: Num rows: 321 Data size: 1236 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ss_store_sk (type: int) + sort order: + + Map-reduce partition columns: ss_store_sk (type: int) + Statistics: Num rows: 321 Data size: 1236 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + Inner Join 1 to 2 + condition expressions: + 0 {KEY.reducesinkkey0} + 1 + 2 + outputColumnNames: _col0 + Statistics: Num rows: 35 Data size: 140 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator + expressions: _col0 (type: int) + outputColumnNames: _col0 + Statistics: Num rows: 35 Data size: 140 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator + compressed: false + Statistics: Num rows: 35 Data size: 140 Basic stats: COMPLETE Column stats: COMPLETE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 1000 +PREHOOK: type: QUERY +POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 1000 +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: s1 + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: s_store_sk is not null (type: boolean) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: s + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (s_store_sk is not null and (s_floor_space > 1000)) (type: boolean) + Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: ss + Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: ss_store_sk is not null (type: boolean) + Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ss_store_sk (type: int) + sort order: + + Map-reduce partition columns: ss_store_sk (type: int) + Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + Inner Join 1 to 2 + condition expressions: + 0 {KEY.reducesinkkey0} + 1 + 2 + outputColumnNames: _col0 + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator + expressions: _col0 (type: int) + outputColumnNames: _col0 + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator + compressed: false + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10 +PREHOOK: type: QUERY +POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10 +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: s1 + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: s_store_sk is not null (type: boolean) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: s + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: s_store_sk is not null (type: boolean) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: ss + Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (ss_store_sk is not null and (ss_quantity > 10)) (type: boolean) + Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ss_store_sk (type: int) + sort order: + + Map-reduce partition columns: ss_store_sk (type: int) + Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + Inner Join 1 to 2 + condition expressions: + 0 {KEY.reducesinkkey0} + 1 + 2 + outputColumnNames: _col0 + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator + expressions: _col0 (type: int) + outputColumnNames: _col0 + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator + compressed: false + Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join customer_address ca on (ca.ca_address_sk = ss.ss_addr_sk) +PREHOOK: type: QUERY +POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join customer_address ca on (ca.ca_address_sk = ss.ss_addr_sk) +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-2 is a root stage + Stage-1 depends on stages: Stage-2 + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-2 + Map Reduce + Map Operator Tree: + TableScan + alias: s + Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: s_store_sk is not null (type: boolean) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: s_store_sk (type: int) + sort order: + + Map-reduce partition columns: s_store_sk (type: int) + Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + alias: ss + Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: (ss_store_sk is not null and ss_addr_sk is not null) (type: boolean) + Statistics: Num rows: 916 Data size: 7012 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ss_store_sk (type: int) + sort order: + + Map-reduce partition columns: ss_store_sk (type: int) + Statistics: Num rows: 916 Data size: 7012 Basic stats: COMPLETE Column stats: COMPLETE + value expressions: ss_addr_sk (type: int) + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + condition expressions: + 0 {KEY.reducesinkkey0} + 1 {VALUE._col6} + outputColumnNames: _col0, _col38 + Statistics: Num rows: 916 Data size: 7328 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator + compressed: false + table: + input format: org.apache.hadoop.mapred.SequenceFileInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat + serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe + + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: ca + Statistics: Num rows: 20 Data size: 2114 Basic stats: COMPLETE Column stats: COMPLETE + Filter Operator + predicate: ca_address_sk is not null (type: boolean) + Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE + Reduce Output Operator + key expressions: ca_address_sk (type: int) + sort order: + + Map-reduce partition columns: ca_address_sk (type: int) + Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE + TableScan + Reduce Output Operator + key expressions: _col38 (type: int) + sort order: + + Map-reduce partition columns: _col38 (type: int) + Statistics: Num rows: 916 Data size: 7328 Basic stats: COMPLETE Column stats: COMPLETE + value expressions: _col0 (type: int) + Reduce Operator Tree: + Join Operator + condition map: + Inner Join 0 to 1 + condition expressions: + 0 {VALUE._col0} + 1 + outputColumnNames: _col0 + Statistics: Num rows: 916 Data size: 3664 Basic stats: COMPLETE Column stats: COMPLETE + Select Operator + expressions: _col0 (type: int) + outputColumnNames: _col0 + Statistics: Num rows: 916 Data size: 3664 Basic stats: COMPLETE Column stats: COMPLETE + File Output Operator + compressed: false + Statistics: Num rows: 916 Data size: 3664 Basic stats: COMPLETE Column stats: COMPLETE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: drop table store_sales +PREHOOK: type: DROPTABLE +PREHOOK: Input: default@store_sales +PREHOOK: Output: default@store_sales +POSTHOOK: query: drop table store_sales +POSTHOOK: type: DROPTABLE +POSTHOOK: Input: default@store_sales +POSTHOOK: Output: default@store_sales +PREHOOK: query: drop table store +PREHOOK: type: DROPTABLE +PREHOOK: Input: default@store +PREHOOK: Output: default@store +POSTHOOK: query: drop table store +POSTHOOK: type: DROPTABLE +POSTHOOK: Input: default@store +POSTHOOK: Output: default@store +PREHOOK: query: drop table customer_address +PREHOOK: type: DROPTABLE +PREHOOK: Input: default@customer_address +PREHOOK: Output: default@customer_address +POSTHOOK: query: drop table customer_address +POSTHOOK: type: DROPTABLE +POSTHOOK: Input: default@customer_address +POSTHOOK: Output: default@customer_address