Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html
The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm.
The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern.
MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by)
For more details see the design doc here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez
Attachments
Issue Links
- is blocked by
-
HIVE-5065 Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
-
- Resolved
-
-
HIVE-4916 Add TezWork
-
- Resolved
-
-
HIVE-4917 Tez Job Monitoring
-
- Resolved
-
-
HIVE-4918 Tez job submission
-
- Resolved
-
-
HIVE-5003 Localize hive exec jar for tez
-
- Resolved
-
-
HIVE-5004 Localize user jars/files/archives
-
- Resolved
-
-
HIVE-5005 Localize hash tables for mapjoin/local work
-
- Resolved
-
-
HIVE-5007 Add TezTaskCompiler
-
- Resolved
-
-
HIVE-5008 Reuse MapRedUtils to generate Map/ReduceWork
-
- Resolved
-
-
HIVE-5040 Yarn resource names cannot contain slashes
-
- Resolved
-
-
HIVE-5041 Retrieve and display diagnostic information when TezTask fails on cluster
-
- Resolved
-
-
HIVE-5042 Allow MiniMr tests to be run on MiniTezCluster
-
- Resolved
-
-
HIVE-5043 Re-enable analyze command for Tez
-
- Resolved
-
-
HIVE-5045 Make ctrl-c work with Tez
-
- Resolved
-
-
HIVE-5052 Set parallelism when generating the tez tasks
-
- Resolved
-
-
HIVE-5053 Let user override the parallelism of each tez task
-
- Resolved
-
-
HIVE-5058 Fix NPE issue with DAG submission in TEZ
-
- Resolved
-
-
HIVE-5073 Fix problem with multiple root tasks in tez
-
- Resolved
-
-
HIVE-5076 Subsequent reduce stages fail when executed in tez
-
- Resolved
-
-
HIVE-5095 Hive needs new operator walker for parallelization/optimization for tez
-
- Resolved
-
-
HIVE-5097 Update DagUtils to reflect changes in Tez API
-
- Resolved
-
-
HIVE-5103 Job numbers are incorrectly displayed in Tez
-
- Resolved
-
-
HIVE-5108 Join on Tez fails in certain cases
-
- Resolved
-
-
HIVE-5148 Jam sessions w/ Tez
-
- Resolved
-
-
HIVE-5151 Going green: Container re-cycling in Tez
-
- Resolved
-
-
HIVE-5183 Tez EdgeProperty class has changed
-
- Resolved
-
-
HIVE-5184 Load filesystem, ugi, metastore client at tez session startup
-
- Resolved
-
-
HIVE-5270 Enable hash joins using tez
-
- Resolved
-
-
HIVE-5271 Convert join op to a map join op in the planning phase
-
- Resolved
-
-
HIVE-5367 fix hive-tez build after tez updates
-
- Resolved
-
-
HIVE-5368 Changes to work creation for tez
-
- Resolved
-
-
HIVE-5386 Update DagUtils/Tez task to reflect tez api changes
-
- Resolved
-
-
HIVE-5387 Need to create edge properties for hive on tez
-
- Resolved
-
-
HIVE-5388 Use a custom LogicalIOProcessor in Tez vertex
-
- Resolved
-
-
HIVE-5389 custom LogicalIOProcessor - map record processor
-
- Resolved
-
-
HIVE-5390 custom LogicalIOProcessor - reduce record processor
-
- Resolved
-
-
HIVE-5404 Remove changes from HIVE-5184
-
- Resolved
-
-
HIVE-5409 Enable vectorization for Tez
-
- Resolved
-
-
HIVE-5437 Add map/reduce input map to MapWork/ReduceWork for multi input
-
- Resolved
-
-
HIVE-5439 Set input edge map for map join operator in Tez
-
- Resolved
-
-
HIVE-5442 Plumbing for map join in tez
-
- Resolved
-
-
HIVE-5451 Use IOContext instead of HADOOPMAPFILENAME in MapOperator setup for Tez
-
- Resolved
-
-
HIVE-5505 PerfLogger statements for Tez
-
- Resolved
-
-
HIVE-5522 Move split generation into the AM
-
- Resolved
-
-
HIVE-5533 Re-connect Tez session after AM timeout
-
- Resolved
-
-
HIVE-5543 Running the mini tez cluster for tez unit tests
-
- Resolved
-
-
HIVE-5544 Print task totals instead of percentages on Tez
-
- Resolved
-
-
HIVE-5551 Create perf logger statements for orc init/split creation
-
- Resolved
-
-
HIVE-5553 Fix vertex start logging for tez
-
- Resolved
-
-
HIVE-5561 Clear work map for container reuse on tez
-
- Resolved
-
-
HIVE-5585 Fix "empty hashtable" problem with container reuse on tez
-
- Resolved
-
-
HIVE-5586 Call createSplits with multiple paths if partition specs match on Tez
-
- Resolved
-
-
HIVE-5587 Change MapRecordProcessor to use VectorMapOperator when necessary on Tez
-
- Resolved
-
-
HIVE-5591 Use TezGroupedSplits to combine splits based on headroom in Tez
-
- Resolved
-
-
HIVE-5608 tez AM should be able to serialize orc footer in splits
-
- Resolved
-
-
HIVE-5620 Tez job progress printing stops after a specific amount of time
-
- Resolved
-
-
HIVE-5638 NPE in ConvertJoinMapJoin on Tez
-
- Resolved
-
-
HIVE-5645 Cannot compile tests on tez branch
-
- Resolved
-
-
HIVE-5647 Fix failing mapreduce tests on the tez branch
-
- Resolved
-
-
HIVE-5650 Print yarn app id when running Tez dag
-
- Resolved
-
-
HIVE-5651 union_view.q is failing on tez branch
-
- Resolved
-
-
HIVE-5688 TestCliDriver compilation fails on tez branch.
-
- Resolved
-
-
HIVE-5689 Add some simple MRR tests
-
- Resolved
-
-
HIVE-5703 While using tez, Qtest needs to close session before creating a new one
-
- Resolved
-
-
HIVE-5719 Remove some overly noisy perflogger statements from Tez codepath
-
- Resolved
-
-
HIVE-5734 Enable merge/move tasks for Tez
-
- Resolved
-
-
HIVE-5735 Enable noscan/partialscan on Tez
-
- Resolved
-
-
HIVE-5736 Fix assertion in Operator.java
-
- Resolved
-
-
HIVE-5738 Need to clear global work in ExecMapper/ExecReducer for container reuse in tez
-
- Resolved
-
-
HIVE-5766 Update call to Tez DAG status to reflect updated API
-
- Resolved
-
-
HIVE-5770 Switch merge tasks to use tez
-
- Resolved
-
-
HIVE-5772 Print message for union operators
-
- Resolved
-
-
HIVE-5778 mapjoin hints on Tez
-
- Resolved
-
-
HIVE-5808 broadcast join in tez discards duplicate records from the broadcasted table
-
- Resolved
-
-
HIVE-5832 Add shutdown hook to stop tez dag/session if jvm dies
-
- Resolved
-
-
HIVE-5862 While running some queries on large data using tez, we OOM.
-
- Resolved
-
-
HIVE-5889 Add counter based stats aggregator for tez
-
- Resolved
-
-
HIVE-5984 Multi insert statement fails on Tez
-
- Resolved
-
-
HIVE-6001 Tez: UDFs are not properly localized
-
- Resolved
-
-
HIVE-6011 correlation optimizer unit tests are failing on tez
-
- Resolved
-
-
HIVE-6014 Stage ids differ in the tez branch
-
- Resolved
-
-
HIVE-6019 Tez: Analyze command fails with dbclass=counter
-
- Resolved
-
-
HIVE-6038 Fix Tez branch to properly compile against hadoop-1 profile
-
- Resolved
-
-
HIVE-6055 Cleanup aisle tez
-
- Resolved
-
-
HIVE-6077 Fixing a couple of orc unit tests on tez
-
- Resolved
-
-
HIVE-6078 Choosing conditional task for merging files is not deterministic in tez
-
- Resolved
-
-
HIVE-6079 Hadoop 1 tests fail in tez branch
-
- Resolved
-
-
HIVE-6081 Dag utils in tez has incorrect dependency on Hadoop20 shims
-
- Resolved
-
-
HIVE-6085 Tez changed test parse tests output
-
- Resolved
-
-
HIVE-6097 Sessions on Tez NPE when quitting CLI
-
- Resolved
-
-
HIVE-6101 Classpath is incorrect for hadoop-1 tests on tez
-
- Resolved
-
-
HIVE-6102 Update golden files for extended explain on mapjoin
-
- Resolved
-
-
HIVE-6103 Change hive.optimize.tez to hive.execution.engine with [mr, tez] values
-
- Resolved
-
-
HIVE-6106 update golden files for tez
-
- Resolved
-
-
HIVE-6135 Fix merge error on tez branch (TestCompareCliDriver)
-
- Resolved
-
-
HIVE-6138 Tez: Add some additional comments to clarify intent
-
- Resolved
-
-
HIVE-6168 Fix some javadoc issues on Tez branch
-
- Resolved
-
-
HIVE-6169 Update tez specific golden files after merge
-
- Resolved
-
-
HIVE-6172 Whitespaces and comments on Tez
-
- Resolved
-
-
HIVE-5080 Add hook to do additional optimization on the operator plan in tez
-
- Resolved
-
-
HIVE-5378 Need to move SetReducerParallelism to the optimize package.
-
- Resolved
-
-
HIVE-5882 Reduce logging verbosity on Tez
-
- Resolved
-
-
HIVE-5948 Output file name is random when using Tez with "insert overwrite local directory"
-
- Resolved
-
-
HIVE-6080 Non-deterministic stage dependencies in tez
-
- Resolved
-
-
HIVE-4810 Refactor exec package
-
- Closed
-
-
HIVE-4811 (Slightly) break up the SemanticAnalyzer monstrosity
-
- Closed
-
-
HIVE-4812 Logical explain plan
-
- Closed
-
-
HIVE-4843 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
-
- Closed
-
-
HIVE-4826 Setup build infrastructure for tez
-
- Resolved
-
-
HIVE-6098 Merge Tez branch into trunk
-
- Resolved
-
-
HIVE-5639 Allow caching of Orc footers in Tez AM
-
- Resolved
-
-
HIVE-4825 Separate MapredWork into MapWork and ReduceWork
-
- Closed
-
- relates to
-
HIVE-6128 Add tez variables to hive-default.xml
-
- Resolved
-
- links to