Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: 0.14.0
    • Component/s: tez
    • Labels:
      None

      Description

      This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.

      More information can be found on the following wiki page:
      https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez

      How to set up your development environment-

      1. Check out Tez trunk.
      2. Install protobuf 2.5.0.
      3. Build Tez with Hadoop 2.2.0.(By default, it builds against Hadoop trunk, which is 3.0.0.)
      4. Install Tez jars on local maven repository with "mvn install -DskipTests".
      5. Check out Pig Tez branch.
      6. Build Pig running "ant jar-withouthadoop".
      7. Set up a single-node (or multi-node) Hadoop 2.2 cluster.
      8. Install Tez following the instructions on the Tez homepage.
      9. Run Pig with "-x tez" option.

      How to run Tez tests-

      • unit test
        ant test-tez
        

        By default, exectype is tez, and hadoopversion is 23 in tez branch. But you can run unit tests in mr mode as follows:

        ant test -Dexectype=mr -Dhadoopversion=20
        
      • e2e tests
        ant -Dharness.old.pig=$PIG_HOME -Dharness.hadoop.home=$HADOOP_HOME -Dharness.cluster.conf=$HADOOP_CONF -Dharness.cluster.bin=$HADOOP_BIN test-e2e-tez -Dhadoopversion=23
        

        Issue Links

        1.
        Move JobCreationException to org.apache.pig.backend.hadoop.executionengine Sub-task Closed Cheolsoo Park
         
        2.
        Tez backend layout Sub-task Closed Cheolsoo Park
         
        3.
        Add a base abstract class for ExecutionEngine Sub-task Closed Cheolsoo Park
         
        4.
        Initial implementation of TezCompiler Sub-task Closed Cheolsoo Park
         
        5.
        Initial implementation of TezJobControlCompiler Sub-task Closed Cheolsoo Park
         
        6.
        Initial implementation of TezLauncher Sub-task Closed Cheolsoo Park
         
        7.
        Initial implementation of TezStats Sub-task Closed Cheolsoo Park
         
        8.
        Initial Implementation of PigProcessor Sub-task Closed Mark Wagner
         
        9.
        Bump hadoop version to 2.2.0 Sub-task Closed Cheolsoo Park
         
        10.
        Allow PigProcessor to handle multiple inputs Sub-task Closed Mark Wagner
         
        11.
        Add TezMiniCluster for unit tests Sub-task Closed Cheolsoo Park
         
        12.
        Empty plan fails to run Sub-task Closed Daniel Dai
         
        13.
        Make register work Sub-task Closed Daniel Dai
         
        14.
        Make order by work Sub-task Closed Daniel Dai
         
        15.
        Add test-tez target to build.xml Sub-task Closed Cheolsoo Park
         
        16.
        Make distinct work Sub-task Closed Alex Bain
         
        17.
        Make limit work Sub-task Closed Alex Bain
         
        18.
        Pig should be able to submit multiple DAG Sub-task Closed Daniel Dai
         
        19.
        e2e test for tez Sub-task Closed Daniel Dai
         
        20.
        Add diagnostic information to TezStats Sub-task Closed Cheolsoo Park
         
        21.
        Fix tez Checkin_2 Sub-task Closed Daniel Dai
         
        22.
        Initial implementation of combiner optimization Sub-task Closed Cheolsoo Park
         
        23.
        Fix tez branch compilation with Hadoop 1.0 Sub-task Closed Cheolsoo Park
         
        24.
        Implement optimizations for LIMIT Sub-task Closed Alex Bain
         
        25.
        UniqueTez staging dir should be used for different users Sub-task Resolved Daniel Dai
         
        26.
        Implement combiner optimizations for DISTINCT Sub-task Closed Alex Bain
         
        27.
        Make Tez work with security Sub-task Closed Rohini Palaniswamy
         
        28.
        Make split work with Tez Sub-task Closed Rohini Palaniswamy
         
        29.
        Make union work with tez Sub-task Closed Cheolsoo Park
         
        30.
        Fix dependencies in ivy.xml Sub-task Closed Cheolsoo Park
         
        31.
        Port Package refactoring to Tez branch Sub-task Closed Mark Wagner
         
        32.
        Fix e2e Operator_1, 5, Checkin_3, and Join_1 Sub-task Closed Cheolsoo Park
         
        33.
        Move POSimpleTezLoad under tez package Sub-task Closed Cheolsoo Park
         
        34.
        Tear down TezSessions when Pig exits Sub-task Closed Rohini Palaniswamy
         
        35.
        Add counters to TezStats Sub-task Closed Cheolsoo Park
         
        36.
        Implement replicated join in Tez Sub-task Closed Cheolsoo Park
         
        37.
        Fix e2e tests Operators_3, Operators_5 Sub-task Closed Daniel Dai
         
        38.
        Add order by string, descending order e2e tests Sub-task Closed Daniel Dai
         
        39.
        Replace broadcast edges with scatter/gather edges in union Sub-task Closed Cheolsoo Park
         
        40.
        TezCompiler adds duplicate predecessors of blocking operators to TezPlan Sub-task Closed Rohini Palaniswamy
         
        41.
        Fix intermittent test failure Join_1 Sub-task Closed Daniel Dai
         
        42.
        Make combiners, custom partitioners and secondary key sort work for multiple outputs Sub-task Closed Rohini Palaniswamy
         
        43.
        Implement STREAM in Tez Sub-task Closed Alex Bain
         
        44.
        Improve performance of order-by Sub-task Closed Daniel Dai
         
        45.
        Make accumulator UDF work in Tez Sub-task Closed Cheolsoo Park
         
        46.
        Implement skewed join in Tez Sub-task Closed Cheolsoo Park
         
        47.
        Implement merge join in Tez Sub-task Closed Daniel Dai
         
        48.
        Use Tez ObjectRegistry to cache FRJoin map and WeightedRangePartitioner map Sub-task Closed Rohini Palaniswamy
         
        49.
        TEZ-41 break pig-tez Sub-task Closed Daniel Dai
         
        50.
        Fix store after load Sub-task Closed Daniel Dai
         
        51.
        Fix skewed join e2e tests Sub-task Closed Cheolsoo Park
         
        52.
        Change tez version dependency as a result of TEZ-739 Sub-task Closed Hitesh Shah
         
        53.
        Fix split + skewed join Sub-task Resolved Rohini Palaniswamy
         
        54.
        Fix TestSkewedJoin in tez mode Sub-task Closed Cheolsoo Park
         
        55.
        Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex Sub-task Closed Rohini Palaniswamy
         
        56.
        Set MR runtime settings on tez runtime Sub-task Closed Rohini Palaniswamy
         
        57.
        Use VertexGroup and Alias vertex for union Sub-task Closed Cheolsoo Park
         
        58.
        Support for multiquery off in Tez Sub-task Closed Rohini Palaniswamy
         
        59.
        Add support for non-Java UDF's Sub-task Closed Alex Bain
         
        60.
        Make scalar work Sub-task Closed Daniel Dai
         
        61.
        Fix desc order by in Tez Sub-task Closed Daniel Dai
         
        62.
        CombinerOptimizer should not optimize cogroup case in tez Sub-task Closed Daniel Dai
         
        63.
        Outer join fail on tez Sub-task Closed Daniel Dai
         
        64.
        Properties aren't propagated to edges or vertices in Tez Sub-task Resolved Mark Wagner
         
        65.
        Use ONE_TO_ONE edge and IdentityInOut in skewed join intermediate vertex Sub-task Closed Rohini Palaniswamy
         
        66.
        Work with TEZ-668 which allows starting and closing of inputs and outputs Sub-task Closed Rohini Palaniswamy
         
        67.
        TezCompiler.visitUnion() doesn't add compiled TezOp to phyToTezOpMap Sub-task Closed Cheolsoo Park
         
        68.
        Scripting UDF is broken after PIG-3629 Sub-task Closed Daniel Dai
         
        69.
        Tez mini cluster tests run for a very long time with TezSession reuse on Sub-task Closed Cheolsoo Park
         
        70.
        Fix TestTezCompiler#testReplicatedJoinInReducer Sub-task Closed Cheolsoo Park
         
        71.
        TezResourceManager should not be a singleton Sub-task Closed Daniel Dai
         
        72.
        POReservoirSample should handle endOfAllInput flag Sub-task Closed Daniel Dai
         
        73.
        Multiquery with FRJoin fail Sub-task Closed Daniel Dai
         
        74.
        NPE when POStream is not in the leaf vertex Sub-task Closed Daniel Dai
         
        75.
        tuple in POStream binaryInputQueue keep changing Sub-task Closed Daniel Dai
         
        76.
        Several changes in Tez e2e Sub-task Closed Daniel Dai
         
        77.
        POValueInputTez should handle getNextTuple even after reader.next() returns null Sub-task Closed Daniel Dai
         
        78.
        Parallelism specified by user is not honored if default parallelism is set to a higher value Sub-task Closed Cheolsoo Park
         
        79.
        Fix some memory leaks affecting container reuse Sub-task Closed Rohini Palaniswamy
         
        80.
        TestCustomPartitioner is broken in tez branch Sub-task Closed Cheolsoo Park
         
        81.
        POPoissonSample should handle endOfAllInput flag Sub-task Closed Daniel Dai
         
        82.
        Implement CROSS in Tez Sub-task Closed Rohini Palaniswamy
         
        83.
        Implement RANK in Tez Sub-task Closed Rohini Palaniswamy
         
        84.
        Remove reference to BroadcastKVReader as it is removed in TEZ-911 Sub-task Closed Rohini Palaniswamy
         
        85.
        Make custom counter work Sub-task Closed Daniel Dai
         
        86.
        Implement mapside cogroup in Tez Sub-task Resolved Unassigned
         
        87.
        Organize tez code into subpackages Sub-task Closed Rohini Palaniswamy
         
        88.
        Honor Mapreduce Distributed Cache settings and localize resources in Tez Sub-task Closed Rohini Palaniswamy
         
        89.
        Pig on tez job hangs when AM has a failure and Multiquery fixes Sub-task Closed Rohini Palaniswamy
         
        90.
        Pig script encounters error with Tez MemoryDistributor Sub-task Resolved Unassigned
         
        91.
        e2e test Rank_9 fail Sub-task Closed Daniel Dai
         
        92.
        e2e tests run all tests even execonly flag does not match Sub-task Closed Daniel Dai
         
        93.
        Fix MergeJoin_8 failure Sub-task Closed Daniel Dai
         
        94.
        UdfDistributedCache_1 fails in tez branch Sub-task Closed Cheolsoo Park
         
        95.
        Global sort is not working (order by) Pig over Tez Sub-task Resolved Unassigned
         
        96.
        Hash join followed by replicated join fails in Tez mode Sub-task Closed Cheolsoo Park
         
        97.
        Fix memory leak with PigTezLogger Sub-task Closed Rohini Palaniswamy
         
        98.
        Fix UnionOptimizer bug with expressions and MR compressions settings not honored Sub-task Closed Rohini Palaniswamy
         
        99.
        Implement PPNL for Tez mode (Pig side changes) Sub-task Closed Cheolsoo Park
         
        100.
        PigRecordWriter throws exception in Tez mode Sub-task Closed Cheolsoo Park
         
        101.
        Fix e2e test failure CastScalar_11 Sub-task Closed Daniel Dai
         
        102.
        Skewed join followed by replicated join fails in Tez Sub-task Closed Cheolsoo Park
         
        103.
        Fix MR unit tests on tez branch Sub-task Closed Daniel Dai
         
        104.
        Pig on tez fails to run in Oozie in secure cluster Sub-task Closed Rohini Palaniswamy
         
        105.
        Get TezStats working for Oozie Sub-task Closed Rohini Palaniswamy
         
        106.
        Make the interval of DAGStatus report configurable Sub-task Closed Cheolsoo Park
         
        107.
        New interface for resetting static variables for jvm reuse Sub-task Closed Rohini Palaniswamy
         
        108.
        Fix compilation failure due in Pig on Tez due to TEZ-1127 change Sub-task Resolved Unassigned
         
        109.
        Refactor TezJob and TezLauncher Sub-task Closed Cheolsoo Park
         
        110.
        Make Streaming UDF work in Tez Sub-task Closed Daniel Dai
         
        111.
        ObjectCache cause ClassCastException Sub-task Closed Cheolsoo Park
         
        112.
        Change from TezJobConfig to TezRuntimeConfiguration Sub-task Closed Rohini Palaniswamy
         
        113.
        Accumulator UDF throws OOM in Tez Sub-task Closed Rohini Palaniswamy
         
        114.
        NPE in packager when union + group-by followed by replicated join in Tez Sub-task Closed Rohini Palaniswamy
         
        115.
        Implement merge cogroup in Tez Sub-task Closed Daniel Dai
         
        116.
        Add Native operator to tez Sub-task Closed Daniel Dai
         
        117.
        Create a target to run mr and tez unit test in one shot Sub-task Closed Daniel Dai
         
        118.
        Pin Tez to 0.5.0 release Sub-task Closed Cheolsoo Park
         
        119.
        Intermediate reducer parallelism in Tez should be higher Sub-task Closed Rohini Palaniswamy
         
        120.
        Mapreduce ACLs should be translated to Tez ACLs Sub-task Closed Rohini Palaniswamy
         
        121.
        Reset UDFContext state before OutputCommitter invocations in Tez Sub-task Closed Rohini Palaniswamy
         
        122.
        Fix few issues related to Union, CROSS and auto parallelism in Tez Sub-task Closed Rohini Palaniswamy
         
        123.
        PigProcessor does not set pig.datetime.default.tz Sub-task Resolved Rohini Palaniswamy
         
        124.
        ObjectCache should use ProcessorContext.getObjectRegistry() Sub-task Resolved Rohini Palaniswamy
         

          Activity

          Hide
          Julien Le Dem added a comment -

          Here is the work that Achal did for Pig-on-Tez
          https://github.com/achalsoni81/pigeon

          Show
          Julien Le Dem added a comment - Here is the work that Achal did for Pig-on-Tez https://github.com/achalsoni81/pigeon
          Hide
          Daniel Dai added a comment -

          Link this Jira to 0.15 since there are still several subproject is under progress.

          Show
          Daniel Dai added a comment - Link this Jira to 0.15 since there are still several subproject is under progress.
          Hide
          Daniel Dai added a comment -

          Resolve this ticket since the major work has completed in 0.14. New improvements and bug fix shall be separate Jiras.

          Show
          Daniel Dai added a comment - Resolve this ticket since the major work has completed in 0.14. New improvements and bug fix shall be separate Jiras.

            People

            • Assignee:
              Cheolsoo Park
              Reporter:
              Cheolsoo Park
            • Votes:
              0 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development