Details
-
Wish
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
0.19.0
-
None
-
None
-
N/A
-
Reviewed
-
Description
Hive is a data warehouse built on top of flat files (stored primarily in HDFS). It includes:
- Data Organization into Tables with logical and hash partitioning
- A Metastore to store metadata about Tables/Partitions etc
- A SQL like query language over object data stored in Tables
- DDL commands to define and load external data into tables
Hive's query language is executed using Hadoop map-reduce as the execution engine. Queries can use either single stage or multi-stage map-reduce. Hive has a native format for tables - but can handle any data set (for example json/thrift/xml) using an IO library framework.
Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and may use Apache Derby as an embedded database for MetaStore. Antlr has a BSD license and should be compatible with Apache license.
We are currently thinking of contributing to the 0.17 branch as a contrib project (since that is the version under which it will get tested internally) - but looking for advice on the best release path.