diff --git README.txt README.txt index 0adbff6..7d00f56 100644 --- README.txt +++ README.txt @@ -1,7 +1,7 @@ Apache Hive @VERSION@ ================= -Hive is a data warehouse system for Hadoop that facilitates +Apache Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc querying and analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to put structure on this data and query the data using a @@ -43,13 +43,13 @@ Getting Started =============== - Installation Instructions and a quick tutorial: - http://wiki.apache.org/hadoop/Hive/GettingStarted + https://cwiki.apache.org/confluence/display/Hive/GettingStarted - A longer tutorial that covers more features of HiveQL: - http://wiki.apache.org/hadoop/Hive/Tutorial + https://cwiki.apache.org/confluence/display/Hive/Tutorial - The HiveQL Language Manual: - http://wiki.apache.org/hadoop/Hive/LanguageManual + https://cwiki.apache.org/confluence/display/Hive/LanguageManual Requirements diff --git RELEASE_NOTES.txt RELEASE_NOTES.txt index 51339a4..1aac3ce 100644 --- RELEASE_NOTES.txt +++ RELEASE_NOTES.txt @@ -1,3 +1,305 @@ +Release Notes - Hive - Version 0.7.0 + +** New Feature + * [HIVE-78] - Authorization infrastructure for Hive + * [HIVE-417] - Implement Indexing in Hive + * [HIVE-471] - Add reflect() UDF for reflective invocation of Java methods + * [HIVE-537] - Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map) + * [HIVE-842] - Authentication Infrastructure for Hive + * [HIVE-1096] - Hive Variables + * [HIVE-1293] - Concurrency Model for Hive + * [HIVE-1304] - add row_sequence UDF + * [HIVE-1405] - hive command line option -i to run an init file before other SQL commands + * [HIVE-1408] - add option to let hive automatically run in local mode based on tunable heuristics + * [HIVE-1413] - bring a table/partition offline + * [HIVE-1438] - sentences() UDF for natural language tokenization + * [HIVE-1481] - ngrams() UDAF for estimating top-k n-gram frequencies + * [HIVE-1514] - Be able to modify a partition's fileformat and file location information. + * [HIVE-1518] - context_ngrams() UDAF for estimating top-k contextual n-grams + * [HIVE-1528] - Add json_tuple() UDTF function + * [HIVE-1529] - Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp. + * [HIVE-1549] - Add ANSI SQL correlation aggregate function CORR(X,Y). + * [HIVE-1609] - Support partition filtering in metastore + * [HIVE-1624] - Patch to allows scripts in S3 location + * [HIVE-1636] - Implement "SHOW TABLES {FROM | IN} db_name" + * [HIVE-1659] - parse_url_tuple: a UDTF version of parse_url + * [HIVE-1661] - Default values for parameters + * [HIVE-1779] - Implement GenericUDF str_to_map + * [HIVE-1790] - Patch to support HAVING clause in Hive + * [HIVE-1792] - track the joins which are being converted to map-join automatically + * [HIVE-1818] - Call frequency and duration metrics for HiveMetaStore via jmx + * [HIVE-1819] - maintain lastAccessTime in the metastore + * [HIVE-1820] - Make Hive database data center aware + * [HIVE-1827] - Add a new local mode flag in Task. + * [HIVE-1835] - Better auto-complete for Hive + * [HIVE-1840] - Support ALTER DATABASE to change database properties + * [HIVE-1856] - Implement DROP TABLE/VIEW ... IF EXISTS + * [HIVE-1858] - Implement DROP {PARTITION, INDEX, TEMPORARY FUNCTION} IF EXISTS + * [HIVE-1881] - Make the MetaStore filesystem interface pluggable via the hive.metastore.fs.handler.class configuration property + * [HIVE-1889] - add an option (hive.index.compact.file.ignore.hdfs) to ignore HDFS location stored in index files. + * [HIVE-1971] - Verbose/echo mode for the Hive CLI + +** Improvement + * [HIVE-138] - Provide option to export a HEADER + * [HIVE-474] - Support for distinct selection on two or more columns + * [HIVE-558] - describe extended table/partition output is cryptic + * [HIVE-1126] - Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name. + * [HIVE-1211] - Tapping logs from child processes + * [HIVE-1226] - support filter pushdown against non-native tables + * [HIVE-1229] - replace dependencies on HBase deprecated API + * [HIVE-1235] - use Ivy for fetching HBase dependencies + * [HIVE-1264] - Make Hive work with Hadoop security + * [HIVE-1378] - Return value for map, array, and struct needs to return a string + * [HIVE-1394] - do not update transient_lastDdlTime if the partition is modified by a housekeeping operation + * [HIVE-1414] - automatically invoke .hiverc init script + * [HIVE-1415] - add CLI command for executing a SQL script + * [HIVE-1430] - serializing/deserializing the query plan is useless and expensive + * [HIVE-1441] - Extend ivy offline mode to cover metastore downloads + * [HIVE-1443] - Add support to turn off bucketing with ALTER TABLE + * [HIVE-1447] - Speed up reflection method calls in GenericUDFBridge and GenericUDAFBridge + * [HIVE-1456] - potentail NullPointerException + * [HIVE-1463] - hive output file names are unnecessarily large + * [HIVE-1469] - replace isArray() calls and remove LOG.isInfoEnabled() in Operator.forward() + * [HIVE-1495] - supply correct information to hooks and lineage for index rebuild + * [HIVE-1497] - support COMMENT clause on CREATE INDEX, and add new command for SHOW INDEXES + * [HIVE-1498] - support IDXPROPERTIES on CREATE INDEX + * [HIVE-1512] - Need to get hive_hbase-handler to work with hbase versions 0.20.4 0.20.5 and cloudera CDH3 version + * [HIVE-1513] - hive starter scripts should load admin/user supplied script for configurability + * [HIVE-1517] - ability to select across a database + * [HIVE-1533] - Use ZooKeeper from maven + * [HIVE-1536] - Add support for JDBC PreparedStatements + * [HIVE-1546] - Ability to plug custom Semantic Analyzers for Hive Grammar + * [HIVE-1581] - CompactIndexInputFormat should create split only for files in the index output file. + * [HIVE-1605] - regression and improvements in handling NULLs in joins + * [HIVE-1611] - Add alternative search-provider to Hive site + * [HIVE-1616] - Add ProtocolBuffersStructObjectInspector + * [HIVE-1617] - ScriptOperator's AutoProgressor can lead to an infinite loop + * [HIVE-1622] - Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true + * [HIVE-1638] - convert commonly used udfs to generic udfs + * [HIVE-1641] - add map joined table to distributed cache + * [HIVE-1642] - Convert join queries to map-join based on size of table/row + * [HIVE-1645] - ability to specify parent directory for zookeeper lock manager + * [HIVE-1655] - Adding consistency check at jobClose() when committing dynamic partitions + * [HIVE-1660] - Change get_partitions_ps to pass partition filter to database + * [HIVE-1692] - FetchOperator.getInputFormatFromCache hides causal exception + * [HIVE-1701] - drop support for pre-0.20 Hadoop versions + * [HIVE-1704] - remove Hadoop 0.17 specific test reference logs + * [HIVE-1738] - Optimize Key Comparison in GroupByOperator + * [HIVE-1743] - Group-by to determine equals of Keys in reverse order + * [HIVE-1746] - Support for using ALTER to set IDXPROPERTIES + * [HIVE-1749] - ExecMapper and ExecReducer: reduce function calls to l4j.isInfoEnabled() + * [HIVE-1750] - Remove Partition Filtering Conditions when Possible + * [HIVE-1751] - Optimize ColumnarStructObjectInspector.getStructFieldData() + * [HIVE-1754] - Remove JDBM component from Map Join + * [HIVE-1757] - test cleanup for Hive-1641 + * [HIVE-1758] - optimize group by hash map memory + * [HIVE-1761] - Support show locks for a particular table + * [HIVE-1765] - Add queryid while locking + * [HIVE-1768] - Update transident_lastDdlTime only if not specified + * [HIVE-1782] - add more debug information for hive locking + * [HIVE-1783] - CommonJoinOperator optimize the case of 1:1 join + * [HIVE-1785] - change Pre/Post Query Hooks to take in 1 parameter: HookContext + * [HIVE-1786] - Improve documentation for str_to_map() UDF + * [HIVE-1787] - optimize the code path when there are no outer joins + * [HIVE-1796] - dumps time at which lock was taken along with the queryid in show locks extended + * [HIVE-1797] - Compressed the hashtable dump file before put into distributed cache + * [HIVE-1798] - Clear empty files in Hive + * [HIVE-1801] - HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice + * [HIVE-1811] - Show the time the local task takes + * [HIVE-1824] - create a new ZooKeeper instance when retrying lock, and more info for debug + * [HIVE-1831] - Add a option to run task to check map-join possibility in non-local mode + * [HIVE-1834] - more debugging for locking + * [HIVE-1843] - add an option in dynamic partition inserts to throw an error if 0 partitions are created + * [HIVE-1852] - Reduce unnecessary DFSClient.rename() calls + * [HIVE-1855] - Include Process ID in the log4j log file name + * [HIVE-1865] - redo zookeeper hive lock manager + * [HIVE-1899] - add a factory method for creating a synchronized wrapper for IMetaStoreClient + * [HIVE-1900] - a mapper should be able to span multiple partitions + * [HIVE-1907] - Store jobid in ExecDriver + * [HIVE-1910] - Provide config parameters to control cache object pinning + * [HIVE-1923] - Allow any type of stats publisher and aggregator in addition to HBase and JDBC + * [HIVE-1929] - Find a way to disable owner grants + * [HIVE-1931] - Improve the implementation of the METASTORE_CACHE_PINOBJTYPES config + * [HIVE-1948] - Have audit logging in the Metastore + * [HIVE-1956] - "Provide DFS initialization script for Hive + * [HIVE-1961] - Make Stats gathering more flexible with timeout and atomicity + * [HIVE-1962] - make a libthrift.jar and libfb303.jar in dist package for backward compatibility + * [HIVE-1970] - Modify build to run all tests regardless of subproject failures + * [HIVE-1978] - Hive SymlinkTextInputFormat does not estimate input size correctly + +** Bug + * [HIVE-307] - "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the same name + * [HIVE-741] - NULL is not handled correctly in join + * [HIVE-1203] - HiveInputFormat.getInputFormatFromCache "swallows" cause exception when throwing IOExcpetion + * [HIVE-1305] - add progress in join and groupby + * [HIVE-1376] - Simple UDAFs with more than 1 parameter crash on empty row query + * [HIVE-1385] - UDF field() doesn't work + * [HIVE-1416] - Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode + * [HIVE-1422] - skip counter update when RunningJob.getCounters() returns null + * [HIVE-1440] - FetchOperator(mapjoin) does not work with RCFile + * [HIVE-1448] - bug in 'set fileformat' + * [HIVE-1453] - Make Eclipse launch templates auto-adjust to Hive version number changes + * [HIVE-1462] - Reporting progress in FileSinkOperator works in multiple directory case + * [HIVE-1465] - hive-site.xml ${user.name} not replaced for local-file derby metastore connection URL + * [HIVE-1470] - percentile_approx() fails with more than 1 reducer + * [HIVE-1471] - CTAS should unescape the column name in the select-clause. + * [HIVE-1473] - plan file should have a high replication factor + * [HIVE-1475] - .gitignore files being placed in test warehouse directories causing build failure + * [HIVE-1489] - TestCliDriver -Doverwrite=true does not put the file in the correct directory + * [HIVE-1491] - fix or disable loadpart_err.q + * [HIVE-1494] - Index followup: remove sort by clause and fix a bug in collect_set udaf + * [HIVE-1501] - when generating reentrant INSERT for index rebuild, quote identifiers using backticks + * [HIVE-1508] - Add cleanup method to HiveHistory class + * [HIVE-1509] - Monitor the working set of the number of files + * [HIVE-1510] - HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path + * [HIVE-1520] - hive.mapred.local.mem should only be used in case of local mode job submissions + * [HIVE-1523] - ql tests no longer work in miniMR mode + * [HIVE-1532] - Replace globStatus with listStatus inside Hive.java's replaceFiles. + * [HIVE-1534] - Join filters do not work correctly with outer joins + * [HIVE-1535] - alter partition should throw exception if the specified partition does not exist. + * [HIVE-1547] - Unarchiving operation throws NPE + * [HIVE-1548] - populate inputs and outputs for all statements + * [HIVE-1556] - Fix TestContribCliDriver test + * [HIVE-1561] - smb_mapjoin_8.q returns different results in miniMr mode + * [HIVE-1563] - HBase tests broken + * [HIVE-1564] - bucketizedhiveinputformat.q fails in minimr mode + * [HIVE-1570] - referencing an added file by it's name in a transform script does not work in hive local mode + * [HIVE-1578] - Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures + * [HIVE-1580] - cleanup ExecDriver.progress + * [HIVE-1583] - Hive should not override Hadoop specific system properties + * [HIVE-1584] - wrong log files in contrib client positive + * [HIVE-1589] - Add HBase/ZK JARs to Eclipse classpath + * [HIVE-1593] - udtf_explode.q is an empty file + * [HIVE-1598] - use SequenceFile rather than TextFile format for hive query results + * [HIVE-1600] - need to sort hook input/output lists for test result determinism + * [HIVE-1601] - Hadoop 0.17 ant test broken by HIVE-1523 + * [HIVE-1606] - For a null value in a string column, JDBC driver returns the string "NULL" + * [HIVE-1607] - Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675 + * [HIVE-1614] - UDTF json_tuple should return null row when input is not a valid JSON string + * [HIVE-1628] - Fix Base64TextInputFormat to be compatible with commons codec 1.4 + * [HIVE-1629] - Patch to fix hashCode method in DoubleWritable class + * [HIVE-1630] - bug in NO_DROP + * [HIVE-1633] - CombineHiveInputFormat fails with "cannot find dir for emptyFile" + * [HIVE-1639] - ExecDriver.addInputPaths() error if partition name contains a comma + * [HIVE-1647] - Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe ) + * [HIVE-1650] - TestContribNegativeCliDriver fails + * [HIVE-1656] - All TestJdbcDriver test cases fail in Eclipse unless a property is added in run config + * [HIVE-1657] - join results are displayed wrongly for some complex joins using select * + * [HIVE-1658] - Fix describe * [extended] column formatting + * [HIVE-1663] - ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty + * [HIVE-1664] - Eclipse build broken + * [HIVE-1670] - MapJoin throws EOFExeption when the mapjoined table has 0 column selected + * [HIVE-1671] - multithreading on Context.pathToCS + * [HIVE-1673] - Create table bug causes the row format property lost when serde is specified. + * [HIVE-1674] - count(*) returns wrong result when a mapper returns empty results + * [HIVE-1678] - NPE in MapJoin + * [HIVE-1688] - In the MapJoinOperator, the code uses tag as alias, which is not always true + * [HIVE-1691] - ANALYZE TABLE command should check columns in partition spec + * [HIVE-1699] - incorrect partition pruning ANALYZE TABLE + * [HIVE-1707] - bug when different partitions are present in different dfs + * [HIVE-1711] - CREATE TABLE LIKE should not set stats in the new table + * [HIVE-1712] - Migrating metadata from derby to mysql thrown NullPointerException + * [HIVE-1713] - duplicated MapRedTask in Multi-table inserts mixed with FileSinkOperator and ReduceSinkOperator + * [HIVE-1716] - make TestHBaseCliDriver use dynamic ports to avoid conflicts with already-running services + * [HIVE-1717] - ant clean should delete stats database + * [HIVE-1720] - hbase_stats.q is failing + * [HIVE-1737] - Two Bugs for Estimating Row Sizes in GroupByOperator + * [HIVE-1742] - Fix Eclipse templates (and use Ivy metadata to generate Eclipse library dependencies) + * [HIVE-1748] - Statistics broken for tables with size in excess of Integer.MAX_VALUE + * [HIVE-1753] - HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat + * [HIVE-1756] - failures in fatal.q in TestNegativeCliDriver + * [HIVE-1759] - Many important broken links on Hive web page + * [HIVE-1760] - Mismatched open/commit transaction calls in case of connection retry + * [HIVE-1767] - Merge files does not work with dynamic partition + * [HIVE-1769] - pcr.q output is non-deterministic + * [HIVE-1771] - ROUND(infinity) chokes + * [HIVE-1775] - Assertation on inputObjInspectors.length in Groupy operator + * [HIVE-1776] - parallel execution and auto-local mode combine to place plan file in wrong file system + * [HIVE-1777] - Outdated comments for GenericUDTF.close() + * [HIVE-1780] - Typo in hive-default.xml + * [HIVE-1781] - outputs not populated for dynamic partitions at compile time + * [HIVE-1794] - GenericUDFOr and GenericUDFAnd cannot receive boolean typed object + * [HIVE-1795] - outputs not correctly populated for alter table + * [HIVE-1804] - Mapjoin will fail if there are no files associating with the join tables + * [HIVE-1806] - The merge criteria on dynamic partitons should be per partiton + * [HIVE-1807] - No Element found exception in BucketMapJoinOptimizer + * [HIVE-1808] - bug in auto_join25.q + * [HIVE-1809] - Hive comparison operators are broken for NaN values + * [HIVE-1812] - spurious rmr failure messages when inserting with dynamic partitioning + * [HIVE-1828] - show locks should not use getTable()/getPartition + * [HIVE-1829] - Fix intermittent failures in TestRemoteMetaStore + * [HIVE-1830] - mappers in group followed by joins may die OOM + * [HIVE-1844] - Hanging hive client caused by TaskRunner's OutOfMemoryError + * [HIVE-1845] - Some attributes in the Eclipse template file is deprecated + * [HIVE-1846] - change hive assumption that local mode mappers/reducers always run in same jvm + * [HIVE-1848] - bug in MAPJOIN + * [HIVE-1849] - add more logging to partition pruning + * [HIVE-1853] - downgrade JDO version + * [HIVE-1854] - Temporarily disable metastore tests for listPartitionsByFilter() + * [HIVE-1857] - mixed case tablename on lefthand side of LATERAL VIEW results in query failing with confusing error message + * [HIVE-1860] - Hive's smallint datatype is not supported by the Hive JDBC driver + * [HIVE-1861] - Hive's float datatype is not supported by the Hive JDBC driver + * [HIVE-1862] - Revive partition filtering in the Hive MetaStore + * [HIVE-1863] - Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver. + * [HIVE-1864] - test load_overwrite.q fails + * [HIVE-1867] - Add mechanism for disabling tests with intermittent failures + * [HIVE-1870] - TestRemoteHiveMetaStore.java accidentally deleted during commit of HIVE-1845 + * [HIVE-1871] - bug introduced by HIVE-1806 + * [HIVE-1873] - Fix 'tar' build target broken in HIVE-1526 + * [HIVE-1874] - fix HBase filter pushdown broken by HIVE-1638 + * [HIVE-1878] - Set the version of Hive trunk to '0.7.0-SNAPSHOT' to avoid confusing it with a release + * [HIVE-1896] - HBase and Contrib JAR names are missing version numbers + * [HIVE-1897] - Alter command execution "when HDFS is down" results in holding stale data in MetaStore + * [HIVE-1902] - create script for the metastore upgrade due to HIVE-78 + * [HIVE-1903] - Can't join HBase tables if one's name is the beginning of the other + * [HIVE-1908] - FileHandler leak on partial iteration of the resultset. + * [HIVE-1912] - Double escaping special chars when removing old partitions in rmr + * [HIVE-1913] - use partition level serde properties + * [HIVE-1914] - failures in testhbaseclidriver + * [HIVE-1915] - authorization on database level is broken. + * [HIVE-1917] - CTAS (create-table-as-select) throws exception when showing results + * [HIVE-1927] - Fix TestHadoop20SAuthBridge failure on Hudson + * [HIVE-1928] - GRANT/REVOKE should handle privileges as tokens, not identifiers + * [HIVE-1934] - alter table rename messes the location + * [HIVE-1936] - hive.semantic.analyzer.hook cannot have multiple values + * [HIVE-1939] - Fix test failure in TestContribCliDriver/url_hook.q + * [HIVE-1944] - dynamic partition insert creating different directories for the same partition during merge + * [HIVE-1951] - input16_cc.q is failing in testminimrclidriver + * [HIVE-1952] - fix some outputs and make some tests deterministic + * [HIVE-1964] - add fully deterministic ORDER BY in test union22.q and input40.q + * [HIVE-1969] - TestMinimrCliDriver merge_dynamic_partition2 and 3 are failing on trunk + * [HIVE-1979] - fix hbase_bulk.m by setting HiveInputFormat + * [HIVE-1981] - TestHadoop20SAuthBridge failed on current trunk + * [HIVE-1995] - Mismatched open/commit transaction calls when using get_partition() + * [HIVE-1998] - Update README.txt and add missing ASF headers + * [HIVE-2007] - Executing queries using Hive Server is not logging to the log file specified in hive-log4j.properties + * [HIVE-2010] - Improve naming and README files for MetaStore upgrade scripts + * [HIVE-2011] - upgrade-0.6.0.mysql.sql script attempts to increase size of PK COLUMNS.TYPE_NAME to 4000 + * [HIVE-2059] - Add datanucleus.identifierFactory property to HiveConf to avoid unintentional MetaStore Schema corruption + * [HIVE-2064] - Make call to SecurityUtil.getServerPrincipal unambiguous + +** Sub-task + * [HIVE-1361] - table/partition level statistics + * [HIVE-1696] - Add delegation token support to metastore + * [HIVE-1810] - a followup patch for changing the description of hive.exec.pre/post.hooks in conf/hive-default.xml + * [HIVE-1823] - upgrade the database thrift interface to allow parameters key-value pairs + * [HIVE-1836] - Extend the CREATE DATABASE command with DBPROPERTIES + * [HIVE-1842] - Add the local flag to all the map red tasks, if the query is running locally. + +** Task + * [HIVE-1526] - Hive should depend on a release version of Thrift + * [HIVE-1817] - Remove Hive dependency on unreleased commons-cli 2.0 Snapshot + * [HIVE-1876] - Update Metastore upgrade scripts to handle schema changes introduced in HIVE-1413 + * [HIVE-1882] - Remove CHANGES.txt + * [HIVE-1904] - Create MetaStore schema upgrade scripts for changes made in HIVE-417 + * [HIVE-1905] - Provide MetaStore schema upgrade scripts for changes made in HIVE-1823 + +** Test + * [HIVE-1464] - improve test query performance + * [HIVE-1755] - JDBM diff in test caused by Hive-1641 + * [HIVE-1774] - merge_dynamic_part's result is not deterministic + * [HIVE-1942] - change the value of hive.input.format to CombineHiveInputFormat for tests + Release Notes - Hive - Version 0.6.0