[HIVE-4660] Let there be Tez - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: None
Labels:
None

Description

Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html

The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm.

The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern.

MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by)

For more details see the design doc here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez

Attachments

Issue Links

is blocked by

HIVE-5065 Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask

Resolved

HIVE-4916 Add TezWork

Resolved

HIVE-4917 Tez Job Monitoring

Resolved

HIVE-4918 Tez job submission

Resolved

HIVE-5003 Localize hive exec jar for tez

Resolved

HIVE-5004 Localize user jars/files/archives

Resolved

HIVE-5005 Localize hash tables for mapjoin/local work

Resolved

HIVE-5007 Add TezTaskCompiler

Resolved

HIVE-5008 Reuse MapRedUtils to generate Map/ReduceWork

Resolved

HIVE-5040 Yarn resource names cannot contain slashes

Resolved

HIVE-5041 Retrieve and display diagnostic information when TezTask fails on cluster

Resolved

HIVE-5042 Allow MiniMr tests to be run on MiniTezCluster

Resolved

HIVE-5043 Re-enable analyze command for Tez

Resolved

HIVE-5045 Make ctrl-c work with Tez

Resolved

HIVE-5052 Set parallelism when generating the tez tasks

Resolved

HIVE-5053 Let user override the parallelism of each tez task

Resolved

HIVE-5058 Fix NPE issue with DAG submission in TEZ

Resolved

HIVE-5073 Fix problem with multiple root tasks in tez

Resolved

HIVE-5076 Subsequent reduce stages fail when executed in tez

Resolved

HIVE-5095 Hive needs new operator walker for parallelization/optimization for tez

Resolved

HIVE-5097 Update DagUtils to reflect changes in Tez API

Resolved

HIVE-5103 Job numbers are incorrectly displayed in Tez

Resolved

HIVE-5108 Join on Tez fails in certain cases

Resolved

HIVE-5148 Jam sessions w/ Tez

Resolved

HIVE-5151 Going green: Container re-cycling in Tez

Resolved

HIVE-5183 Tez EdgeProperty class has changed

Resolved

HIVE-5184 Load filesystem, ugi, metastore client at tez session startup

Resolved

HIVE-5270 Enable hash joins using tez

Resolved

HIVE-5271 Convert join op to a map join op in the planning phase

Resolved

HIVE-5367 fix hive-tez build after tez updates

Resolved

HIVE-5368 Changes to work creation for tez

Resolved

HIVE-5386 Update DagUtils/Tez task to reflect tez api changes

Resolved

HIVE-5387 Need to create edge properties for hive on tez

Resolved

HIVE-5388 Use a custom LogicalIOProcessor in Tez vertex

Resolved

HIVE-5389 custom LogicalIOProcessor - map record processor

Resolved

HIVE-5390 custom LogicalIOProcessor - reduce record processor

Resolved

HIVE-5404 Remove changes from HIVE-5184

Resolved

HIVE-5409 Enable vectorization for Tez

Resolved

HIVE-5437 Add map/reduce input map to MapWork/ReduceWork for multi input

Resolved

HIVE-5439 Set input edge map for map join operator in Tez

Resolved

HIVE-5442 Plumbing for map join in tez

Resolved

HIVE-5451 Use IOContext instead of HADOOPMAPFILENAME in MapOperator setup for Tez

Resolved

HIVE-5505 PerfLogger statements for Tez

Resolved

HIVE-5522 Move split generation into the AM

Resolved

HIVE-5533 Re-connect Tez session after AM timeout

Resolved

HIVE-5543 Running the mini tez cluster for tez unit tests

Resolved

HIVE-5544 Print task totals instead of percentages on Tez

Resolved

HIVE-5551 Create perf logger statements for orc init/split creation

Resolved

HIVE-5553 Fix vertex start logging for tez

Resolved

HIVE-5561 Clear work map for container reuse on tez

Resolved

HIVE-5585 Fix "empty hashtable" problem with container reuse on tez

Resolved

HIVE-5586 Call createSplits with multiple paths if partition specs match on Tez

Resolved

HIVE-5587 Change MapRecordProcessor to use VectorMapOperator when necessary on Tez

Resolved

HIVE-5591 Use TezGroupedSplits to combine splits based on headroom in Tez

Resolved

HIVE-5608 tez AM should be able to serialize orc footer in splits

Resolved

HIVE-5620 Tez job progress printing stops after a specific amount of time

Resolved

HIVE-5638 NPE in ConvertJoinMapJoin on Tez

Resolved

HIVE-5645 Cannot compile tests on tez branch

Resolved

HIVE-5647 Fix failing mapreduce tests on the tez branch

Resolved

HIVE-5650 Print yarn app id when running Tez dag

Resolved

HIVE-5651 union_view.q is failing on tez branch

Resolved

HIVE-5688 TestCliDriver compilation fails on tez branch.

Resolved

HIVE-5689 Add some simple MRR tests

Resolved

HIVE-5703 While using tez, Qtest needs to close session before creating a new one

Resolved

HIVE-5719 Remove some overly noisy perflogger statements from Tez codepath

Resolved

HIVE-5734 Enable merge/move tasks for Tez

Resolved

HIVE-5735 Enable noscan/partialscan on Tez

Resolved

HIVE-5736 Fix assertion in Operator.java

Resolved

HIVE-5738 Need to clear global work in ExecMapper/ExecReducer for container reuse in tez

Resolved

HIVE-5766 Update call to Tez DAG status to reflect updated API

Resolved

HIVE-5770 Switch merge tasks to use tez

Resolved

HIVE-5772 Print message for union operators

Resolved

HIVE-5778 mapjoin hints on Tez

Resolved

HIVE-5808 broadcast join in tez discards duplicate records from the broadcasted table

Resolved

HIVE-5832 Add shutdown hook to stop tez dag/session if jvm dies

Resolved

HIVE-5862 While running some queries on large data using tez, we OOM.

Resolved

HIVE-5889 Add counter based stats aggregator for tez

Resolved

HIVE-5984 Multi insert statement fails on Tez

Resolved

HIVE-6001 Tez: UDFs are not properly localized

Resolved

HIVE-6011 correlation optimizer unit tests are failing on tez

Resolved

HIVE-6014 Stage ids differ in the tez branch

Resolved

HIVE-6019 Tez: Analyze command fails with dbclass=counter

Resolved

HIVE-6038 Fix Tez branch to properly compile against hadoop-1 profile

Resolved

HIVE-6055 Cleanup aisle tez

Resolved

HIVE-6077 Fixing a couple of orc unit tests on tez

Resolved

HIVE-6078 Choosing conditional task for merging files is not deterministic in tez

Resolved

HIVE-6079 Hadoop 1 tests fail in tez branch

Resolved

HIVE-6081 Dag utils in tez has incorrect dependency on Hadoop20 shims

Resolved

HIVE-6085 Tez changed test parse tests output

Resolved

HIVE-6097 Sessions on Tez NPE when quitting CLI

Resolved

HIVE-6101 Classpath is incorrect for hadoop-1 tests on tez

Resolved

HIVE-6102 Update golden files for extended explain on mapjoin

Resolved

HIVE-6103 Change hive.optimize.tez to hive.execution.engine with [mr, tez] values

Resolved

HIVE-6106 update golden files for tez

Resolved

HIVE-6135 Fix merge error on tez branch (TestCompareCliDriver)

Resolved

HIVE-6138 Tez: Add some additional comments to clarify intent

Resolved

HIVE-6168 Fix some javadoc issues on Tez branch

Resolved

HIVE-6169 Update tez specific golden files after merge

Resolved

HIVE-6172 Whitespaces and comments on Tez

Resolved

HIVE-5080 Add hook to do additional optimization on the operator plan in tez

Resolved

HIVE-5378 Need to move SetReducerParallelism to the optimize package.

Resolved

HIVE-5882 Reduce logging verbosity on Tez

Resolved

HIVE-5948 Output file name is random when using Tez with "insert overwrite local directory"

Resolved

HIVE-6080 Non-deterministic stage dependencies in tez

Resolved

HIVE-4810 Refactor exec package

Closed

HIVE-4811 (Slightly) break up the SemanticAnalyzer monstrosity

Closed

HIVE-4812 Logical explain plan

Closed

HIVE-4843 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

Closed

HIVE-4826 Setup build infrastructure for tez

Resolved

HIVE-6098 Merge Tez branch into trunk

Resolved

HIVE-5639 Allow caching of Orc footers in Tez AM

Resolved

HIVE-4825 Separate MapredWork into MapWork and ReduceWork

Closed

is related to

TEZ-4370 Apache Tez adoption umbrella

Open

TEZ-135 Let there be hive

Closed

relates to

HIVE-6128 Add tez variables to hive-default.xml

Resolved

links to

Design document/Spec

(107 is blocked by, 2 is related to, 1 relates to, 1 links to)

Activity

People

Assignee:: Gunther Hagleitner

Reporter:: Gunther Hagleitner

Votes:: 0 Vote for this issue

Watchers:: 35 Start watching this issue

Dates

Created:: 05/Jun/13 07:42

Updated:: 12/Jan/22 12:52

Resolved:: 10/Jan/14 02:43