diff --git README.txt README.txt index 7d00f56..ceed160 100644 --- README.txt +++ README.txt @@ -1,14 +1,27 @@ -Apache Hive @VERSION@ -================= - -Apache Hive is a data warehouse system for Hadoop that facilitates -easy data summarization, ad-hoc querying and analysis of large -datasets stored in Hadoop compatible file systems. Hive provides a -mechanism to put structure on this data and query the data using a -SQL-like language called HiveQL. At the same time this language also -allows traditional map/reduce programmers to plug in their custom -mappers and reducers when it is inconvenient or inefficient to express -this logic in HiveQL. +Apache Hive (TM) @VERSION@ +====================== + +The Apache Hive (TM) data warehouse software facilitates querying and +managing large datasets residing in distributed storage. Built on top +of Apache Hadoop (TM), it provides: + +* Tools to enable easy data extract/transform/load (ETL) + +* A mechanism to impose structure on a variety of data formats + +* Access to files stored either directly in Apache HDFS (TM) or in other + data storage systems such as Apache HBase (TM) + +* Query execution via MapReduce + +Hive defines a simple SQL-like query language, called QL, that enables +users familiar with SQL to query the data. At the same time, this +language also allows programmers who are familiar with the MapReduce +framework to be able to plug in their custom mappers and reducers to +perform more sophisticated analysis that may not be supported by the +built-in capabilities of the language. QL can also be extended with +custom scalar functions (UDF's), aggregations (UDAF's), and table +functions (UDTF's). Please note that Hadoop is a batch processing system and Hadoop jobs tend to have high latency and incur substantial overheads in job