Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html

      The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm.

      The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern.

      MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by)

      For more details see the design doc here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez

        Issue Links

          Activity

          Hide
          Eric Hanson added a comment -

          Overall the spec looks good.

          It is cute, but please rename this JIRA to a descriptive title with minimal or no acronyms.

          Please put author(s), date, and version number in the spec.

          Show
          Eric Hanson added a comment - Overall the spec looks good. It is cute, but please rename this JIRA to a descriptive title with minimal or no acronyms. Please put author(s), date, and version number in the spec.
          Hide
          Eric Hanson added a comment -

          If the following use case can be handled in Tez, that would be fantastic.

          With vectorized query execution (HIVE-4160) on ORC data I am seeing the following. A simple single-table group-by/aggregate query against 218 million rows on a 6 core machine, with one map followed by one reduce, takes about 40 seconds to run. But if you look at the Windows task manager, you see that there is about a 20 second setup period with only 5-20% CPU consumption, then there is a 5 second burst of 100% CPU consumption where all the mappers are running full steam, and then it takes another 15 seconds or so to finish the query, also with only about 5-20% CPU consumption. The query is not I/O bound.

          If the CPU slack could be eliminated, i.e. the CPU cores could run near 100% from start to finish, the query could probably run in 7 seconds.

          If you could include a discussion of this use case in the spec and how Tez will help now and/or in later Tez versions, or if other work beyond the scope of Tez is needed, that would be great.

          Show
          Eric Hanson added a comment - If the following use case can be handled in Tez, that would be fantastic. With vectorized query execution ( HIVE-4160 ) on ORC data I am seeing the following. A simple single-table group-by/aggregate query against 218 million rows on a 6 core machine, with one map followed by one reduce, takes about 40 seconds to run. But if you look at the Windows task manager, you see that there is about a 20 second setup period with only 5-20% CPU consumption, then there is a 5 second burst of 100% CPU consumption where all the mappers are running full steam, and then it takes another 15 seconds or so to finish the query, also with only about 5-20% CPU consumption. The query is not I/O bound. If the CPU slack could be eliminated, i.e. the CPU cores could run near 100% from start to finish, the query could probably run in 7 seconds. If you could include a discussion of this use case in the spec and how Tez will help now and/or in later Tez versions, or if other work beyond the scope of Tez is needed, that would be great.
          Hide
          Edward Capriolo added a comment -

          I enjoy colorful ticket names like 'spilling like it's 1999' as much as anyone, but I think we should rope it in a bit. Using the gag in the summary but keep the issue name clean.

          Show
          Edward Capriolo added a comment - I enjoy colorful ticket names like 'spilling like it's 1999' as much as anyone, but I think we should rope it in a bit. Using the gag in the summary but keep the issue name clean.
          Hide
          Gunther Hagleitner added a comment -

          Updated design doc is on wiki now: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez

          I've also removed the acronyms from the title.

          Show
          Gunther Hagleitner added a comment - Updated design doc is on wiki now: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez I've also removed the acronyms from the title.
          Hide
          Edward Capriolo added a comment -

          Thanks for uploading that. I am still getting up to speed a bit, silly question:

          I am looking through the tez source code and attempting to understand it's basic optimizations.

          I am looking at GroupByOrderByMRRTest.

          /**

          • Simple example that does a GROUP BY ORDER BY in an MRR job
          • Consider a query such as
          • Select DeptName, COUNT as cnt FROM EmployeeTable
          • GROUP BY DeptName ORDER BY cnt;

          I notice that this test essentially runs the job single reducer.
          job.setNumReduceTasks(1);

          /**

          • Shuffle ensures ordering based on count of employees per department
          • hence the final reducer is a no-op and just emits the department name
          • with the employee count per department.
            */

          What mechanism makes the above optimization happen? Do all shuffles have a natural total order sort with Tez?

          Show
          Edward Capriolo added a comment - Thanks for uploading that. I am still getting up to speed a bit, silly question: I am looking through the tez source code and attempting to understand it's basic optimizations. I am looking at GroupByOrderByMRRTest. /** Simple example that does a GROUP BY ORDER BY in an MRR job Consider a query such as Select DeptName, COUNT as cnt FROM EmployeeTable GROUP BY DeptName ORDER BY cnt; I notice that this test essentially runs the job single reducer. job.setNumReduceTasks(1); /** Shuffle ensures ordering based on count of employees per department hence the final reducer is a no-op and just emits the department name with the employee count per department. */ What mechanism makes the above optimization happen? Do all shuffles have a natural total order sort with Tez?
          Hide
          Edward Capriolo added a comment -

          We’ve initially investigated to add Tez as a simple shim option to the code base. This didn’t work out mostly because Tez’ API is very different from the MR api. It does not make much sense to move the entire “execute” infrastructure to the shim layer. That would require large code changes with little benefit. Instead there will be separate “Task” implementations for MR and TEZ and hive will decide at runtime which implementation to use.

          We’re planning to have two packages:
          org.apache.hadoop.hive.ql.exec.mr
          org.apache.hadoop.hive.ql.exec.tez

          Can you please go into some detail here? My larger concern is hive having 2x of everything, and then doing it in such a way that another integration will involve 3x of everything. What approximate percentage of the exec package will need to be duplicated? Worried about hive codebase having to many "silos's".

          Show
          Edward Capriolo added a comment - We’ve initially investigated to add Tez as a simple shim option to the code base. This didn’t work out mostly because Tez’ API is very different from the MR api. It does not make much sense to move the entire “execute” infrastructure to the shim layer. That would require large code changes with little benefit. Instead there will be separate “Task” implementations for MR and TEZ and hive will decide at runtime which implementation to use. We’re planning to have two packages: org.apache.hadoop.hive.ql.exec.mr org.apache.hadoop.hive.ql.exec.tez Can you please go into some detail here? My larger concern is hive having 2x of everything, and then doing it in such a way that another integration will involve 3x of everything. What approximate percentage of the exec package will need to be duplicated? Worried about hive codebase having to many "silos's".
          Hide
          Hitesh Shah added a comment -

          Edward Capriolo Answering the questions raised in https://issues.apache.org/jira/browse/HIVE-4660?focusedCommentId=13717840&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13717840 :

          • In the GroupbyOrderBy example, the intermediate reducer, the key-value output that it generates is in the form of <count of occurrence of word, the word itself>. As part of shuffle ( similar to mapreduce ), this will automatically partition and sort based on the key.
          • There is no total order as such provided by tez. In this case, the final reducer has a single reduce task which ends up creating the total ordering.

          Feel free to post more questions to dev@tez.incubator.apache.org

          Show
          Hitesh Shah added a comment - Edward Capriolo Answering the questions raised in https://issues.apache.org/jira/browse/HIVE-4660?focusedCommentId=13717840&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13717840 : In the GroupbyOrderBy example, the intermediate reducer, the key-value output that it generates is in the form of <count of occurrence of word, the word itself>. As part of shuffle ( similar to mapreduce ), this will automatically partition and sort based on the key. There is no total order as such provided by tez. In this case, the final reducer has a single reduce task which ends up creating the total ordering. Feel free to post more questions to dev@tez.incubator.apache.org
          Hide
          Bikas Saha added a comment -

          Folks, FYI, based on recent feedback we have changed the names used in some of the TEZ API's. It a simple refactoring on the Tez side and should be a simple refactoring fix on the Pig side too. Jira for reference. TEZ-410.

          Show
          Bikas Saha added a comment - Folks, FYI, based on recent feedback we have changed the names used in some of the TEZ API's. It a simple refactoring on the Tez side and should be a simple refactoring fix on the Pig side too. Jira for reference. TEZ-410 .
          Hide
          Bikas Saha added a comment -

          Sorry I meant Hive instead of Pig.

          Show
          Bikas Saha added a comment - Sorry I meant Hive instead of Pig.
          Hide
          Gunther Hagleitner added a comment -

          With the commit of HIVE-6098 this ticket is complete.

          Show
          Gunther Hagleitner added a comment - With the commit of HIVE-6098 this ticket is complete.
          Hide
          Bikas Saha added a comment -

          Congratulations to the Hive team on this awesome piece of work!

          Show
          Bikas Saha added a comment - Congratulations to the Hive team on this awesome piece of work!
          Hide
          Eric Hanson added a comment -

          Yes, this great work! Thanks to everybody who contributed.

          Show
          Eric Hanson added a comment - Yes, this great work! Thanks to everybody who contributed.
          Hide
          Lefty Leverenz added a comment -

          ... and there was Tez. But mere mortals don't rest on the seventh day: can the design doc be updated now?

          For example, "Functional requirements of phase I" mentions hive.optimize.tez but hasn't that been replaced by hive.execution.engine?

          Also a recent HiveConf.java doesn't contain mapreduce.framework.name which is mentioned in the design doc, but it does contain hive.compute.splits.in.am with the comment "Whether to generate the splits locally or in the AM (tez only)." Are there any more Tez config params?

          Here's the doc link: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez#HiveonTez-FunctionalrequirementsofphaseI

          Show
          Lefty Leverenz added a comment - ... and there was Tez. But mere mortals don't rest on the seventh day: can the design doc be updated now? For example, "Functional requirements of phase I" mentions hive.optimize.tez but hasn't that been replaced by hive.execution.engine? Also a recent HiveConf.java doesn't contain mapreduce.framework.name which is mentioned in the design doc, but it does contain hive.compute.splits.in.am with the comment "Whether to generate the splits locally or in the AM (tez only)." Are there any more Tez config params? Here's the doc link: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez#HiveonTez-FunctionalrequirementsofphaseI
          Hide
          Gunther Hagleitner added a comment -

          Lefty Leverenz - I've attached release notes to HIVE-6098. You should get the right settings/flags from there. I will also update the design doc.

          As for your questions:

          • Yes hive.optimize.tez is gone if favor of hive.execution.engine=[mr, tez]
          • mapreduce.framework.name is not a hive setting. You can use it with hive though and if you set it to yarn-tez, you can use hive on MR and MR will be emulated on tez. Probably best to not mention this - as it mostly just adds confusion.
          Show
          Gunther Hagleitner added a comment - Lefty Leverenz - I've attached release notes to HIVE-6098 . You should get the right settings/flags from there. I will also update the design doc. As for your questions: Yes hive.optimize.tez is gone if favor of hive.execution.engine= [mr, tez] mapreduce.framework.name is not a hive setting. You can use it with hive though and if you set it to yarn-tez, you can use hive on MR and MR will be emulated on tez. Probably best to not mention this - as it mostly just adds confusion.

            People

            • Assignee:
              Gunther Hagleitner
              Reporter:
              Gunther Hagleitner
            • Votes:
              0 Vote for this issue
              Watchers:
              33 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development