Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3446

Umbrella jira for Pig on Tez

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • tez-branch
    • 0.14.0
    • tez
    • None

    Description

      This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.

      More information can be found on the following wiki page:
      https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez

      How to set up your development environment-

      1. Check out Tez trunk.
      2. Install protobuf 2.5.0.
      3. Build Tez with Hadoop 2.2.0.(By default, it builds against Hadoop trunk, which is 3.0.0.)
      4. Install Tez jars on local maven repository with "mvn install -DskipTests".
      5. Check out Pig Tez branch.
      6. Build Pig running "ant jar-withouthadoop".
      7. Set up a single-node (or multi-node) Hadoop 2.2 cluster.
      8. Install Tez following the instructions on the Tez homepage.
      9. Run Pig with "-x tez" option.

      How to run Tez tests-

      • unit test
        ant test-tez
        

        By default, exectype is tez, and hadoopversion is 23 in tez branch. But you can run unit tests in mr mode as follows:

        ant test -Dexectype=mr -Dhadoopversion=20
        
      • e2e tests
        ant -Dharness.old.pig=$PIG_HOME -Dharness.hadoop.home=$HADOOP_HOME -Dharness.cluster.conf=$HADOOP_CONF -Dharness.cluster.bin=$HADOOP_BIN test-e2e-tez -Dhadoopversion=23
        

      Attachments

        Issue Links

          1.
          Move JobCreationException to org.apache.pig.backend.hadoop.executionengine Sub-task Closed Cheolsoo Park
          2.
          Tez backend layout Sub-task Closed Cheolsoo Park
          3.
          Add a base abstract class for ExecutionEngine Sub-task Closed Cheolsoo Park
          4.
          Initial implementation of TezCompiler Sub-task Closed Cheolsoo Park
          5.
          Initial implementation of TezJobControlCompiler Sub-task Closed Cheolsoo Park
          6.
          Initial implementation of TezLauncher Sub-task Closed Cheolsoo Park
          7.
          Initial implementation of TezStats Sub-task Closed Cheolsoo Park
          8.
          Initial Implementation of PigProcessor Sub-task Closed Mark Wagner
          9.
          Bump hadoop version to 2.2.0 Sub-task Closed Cheolsoo Park
          10.
          Allow PigProcessor to handle multiple inputs Sub-task Closed Mark Wagner
          11.
          Add TezMiniCluster for unit tests Sub-task Closed Cheolsoo Park
          12.
          Empty plan fails to run Sub-task Closed Daniel Dai
          13.
          Make register work Sub-task Closed Daniel Dai
          14.
          Make order by work Sub-task Closed Daniel Dai
          15.
          Add test-tez target to build.xml Sub-task Closed Cheolsoo Park
          16.
          Make distinct work Sub-task Closed Alex Bain
          17.
          Make limit work Sub-task Closed Alex Bain
          18.
          Pig should be able to submit multiple DAG Sub-task Closed Daniel Dai
          19.
          e2e test for tez Sub-task Closed Daniel Dai
          20.
          Add diagnostic information to TezStats Sub-task Closed Cheolsoo Park
          21.
          Fix tez Checkin_2 Sub-task Closed Daniel Dai
          22.
          Initial implementation of combiner optimization Sub-task Closed Cheolsoo Park
          23.
          Fix tez branch compilation with Hadoop 1.0 Sub-task Closed Cheolsoo Park
          24.
          Implement optimizations for LIMIT Sub-task Closed Alex Bain
          25.
          UniqueTez staging dir should be used for different users Sub-task Resolved Daniel Dai
          26.
          Implement combiner optimizations for DISTINCT Sub-task Closed Alex Bain
          27.
          Make Tez work with security Sub-task Closed Rohini Palaniswamy
          28.
          Make split work with Tez Sub-task Closed Rohini Palaniswamy
          29.
          Make union work with tez Sub-task Closed Cheolsoo Park
          30.
          Fix dependencies in ivy.xml Sub-task Closed Cheolsoo Park
          31.
          Port Package refactoring to Tez branch Sub-task Closed Mark Wagner
          32.
          Fix e2e Operator_1, 5, Checkin_3, and Join_1 Sub-task Closed Cheolsoo Park
          33.
          Move POSimpleTezLoad under tez package Sub-task Closed Cheolsoo Park
          34.
          Tear down TezSessions when Pig exits Sub-task Closed Rohini Palaniswamy
          35.
          Add counters to TezStats Sub-task Closed Cheolsoo Park
          36.
          Implement replicated join in Tez Sub-task Closed Cheolsoo Park
          37.
          Fix e2e tests Operators_3, Operators_5 Sub-task Closed Daniel Dai
          38.
          Add order by string, descending order e2e tests Sub-task Closed Daniel Dai
          39.
          Replace broadcast edges with scatter/gather edges in union Sub-task Closed Cheolsoo Park
          40.
          TezCompiler adds duplicate predecessors of blocking operators to TezPlan Sub-task Closed Rohini Palaniswamy
          41.
          Fix intermittent test failure Join_1 Sub-task Closed Daniel Dai
          42.
          Make combiners, custom partitioners and secondary key sort work for multiple outputs Sub-task Closed Rohini Palaniswamy
          43.
          Implement STREAM in Tez Sub-task Closed Alex Bain
          44.
          Improve performance of order-by Sub-task Closed Daniel Dai
          45.
          Make accumulator UDF work in Tez Sub-task Closed Cheolsoo Park
          46.
          Implement skewed join in Tez Sub-task Closed Cheolsoo Park
          47.
          Implement merge join in Tez Sub-task Closed Daniel Dai
          48.
          Use Tez ObjectRegistry to cache FRJoin map and WeightedRangePartitioner map Sub-task Closed Rohini Palaniswamy
          49.
          TEZ-41 break pig-tez Sub-task Closed Daniel Dai
          50.
          Fix store after load Sub-task Closed Daniel Dai
          51.
          Fix skewed join e2e tests Sub-task Closed Cheolsoo Park
          52.
          Change tez version dependency as a result of TEZ-739 Sub-task Closed Hitesh Shah
          53.
          Fix split + skewed join Sub-task Resolved Rohini Palaniswamy
          54.
          Fix TestSkewedJoin in tez mode Sub-task Closed Cheolsoo Park
          55.
          Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex Sub-task Closed Rohini Palaniswamy
          56.
          Set MR runtime settings on tez runtime Sub-task Closed Rohini Palaniswamy
          57.
          Use VertexGroup and Alias vertex for union Sub-task Closed Cheolsoo Park
          58.
          Support for multiquery off in Tez Sub-task Closed Rohini Palaniswamy
          59.
          Add support for non-Java UDF's Sub-task Closed Alex Bain
          60.
          Make scalar work Sub-task Closed Daniel Dai
          61.
          Fix desc order by in Tez Sub-task Closed Daniel Dai
          62.
          CombinerOptimizer should not optimize cogroup case in tez Sub-task Closed Daniel Dai
          63.
          Outer join fail on tez Sub-task Closed Daniel Dai
          64.
          Properties aren't propagated to edges or vertices in Tez Sub-task Resolved Mark Wagner
          65.
          Use ONE_TO_ONE edge and IdentityInOut in skewed join intermediate vertex Sub-task Closed Rohini Palaniswamy
          66.
          Work with TEZ-668 which allows starting and closing of inputs and outputs Sub-task Closed Rohini Palaniswamy
          67.
          TezCompiler.visitUnion() doesn't add compiled TezOp to phyToTezOpMap Sub-task Closed Cheolsoo Park
          68.
          Scripting UDF is broken after PIG-3629 Sub-task Closed Daniel Dai
          69.
          Tez mini cluster tests run for a very long time with TezSession reuse on Sub-task Closed Cheolsoo Park
          70.
          Fix TestTezCompiler#testReplicatedJoinInReducer Sub-task Closed Cheolsoo Park
          71.
          TezResourceManager should not be a singleton Sub-task Closed Daniel Dai
          72.
          POReservoirSample should handle endOfAllInput flag Sub-task Closed Daniel Dai
          73.
          Multiquery with FRJoin fail Sub-task Closed Daniel Dai
          74.
          NPE when POStream is not in the leaf vertex Sub-task Closed Daniel Dai
          75.
          tuple in POStream binaryInputQueue keep changing Sub-task Closed Daniel Dai
          76.
          Several changes in Tez e2e Sub-task Closed Daniel Dai
          77.
          POValueInputTez should handle getNextTuple even after reader.next() returns null Sub-task Closed Daniel Dai
          78.
          Parallelism specified by user is not honored if default parallelism is set to a higher value Sub-task Closed Cheolsoo Park
          79.
          Fix some memory leaks affecting container reuse Sub-task Closed Rohini Palaniswamy
          80.
          TestCustomPartitioner is broken in tez branch Sub-task Closed Cheolsoo Park
          81.
          POPoissonSample should handle endOfAllInput flag Sub-task Closed Daniel Dai
          82.
          Implement CROSS in Tez Sub-task Closed Rohini Palaniswamy
          83.
          Implement RANK in Tez Sub-task Closed Rohini Palaniswamy
          84.
          Remove reference to BroadcastKVReader as it is removed in TEZ-911 Sub-task Closed Rohini Palaniswamy
          85.
          Make custom counter work Sub-task Closed Daniel Dai
          86.
          Implement mapside cogroup in Tez Sub-task Resolved Unassigned
          87.
          Organize tez code into subpackages Sub-task Closed Rohini Palaniswamy
          88.
          Honor Mapreduce Distributed Cache settings and localize resources in Tez Sub-task Closed Rohini Palaniswamy
          89.
          Pig on tez job hangs when AM has a failure and Multiquery fixes Sub-task Closed Rohini Palaniswamy
          90.
          Pig script encounters error with Tez MemoryDistributor Sub-task Resolved Unassigned
          91.
          e2e test Rank_9 fail Sub-task Closed Daniel Dai
          92.
          e2e tests run all tests even execonly flag does not match Sub-task Closed Daniel Dai
          93.
          Fix MergeJoin_8 failure Sub-task Closed Daniel Dai
          94.
          UdfDistributedCache_1 fails in tez branch Sub-task Closed Cheolsoo Park
          95.
          Global sort is not working (order by) Pig over Tez Sub-task Resolved Unassigned
          96.
          Hash join followed by replicated join fails in Tez mode Sub-task Closed Cheolsoo Park
          97.
          Fix memory leak with PigTezLogger Sub-task Closed Rohini Palaniswamy
          98.
          Fix UnionOptimizer bug with expressions and MR compressions settings not honored Sub-task Closed Rohini Palaniswamy
          99.
          Implement PPNL for Tez mode (Pig side changes) Sub-task Closed Cheolsoo Park
          100.
          PigRecordWriter throws exception in Tez mode Sub-task Closed Cheolsoo Park
          101.
          Fix e2e test failure CastScalar_11 Sub-task Closed Daniel Dai
          102.
          Skewed join followed by replicated join fails in Tez Sub-task Closed Cheolsoo Park
          103.
          Fix MR unit tests on tez branch Sub-task Closed Daniel Dai
          104.
          Pig on tez fails to run in Oozie in secure cluster Sub-task Closed Rohini Palaniswamy
          105.
          Get TezStats working for Oozie Sub-task Closed Rohini Palaniswamy
          106.
          Make the interval of DAGStatus report configurable Sub-task Closed Cheolsoo Park
          107.
          New interface for resetting static variables for jvm reuse Sub-task Closed Rohini Palaniswamy
          108.
          Fix compilation failure due in Pig on Tez due to TEZ-1127 change Sub-task Resolved Unassigned
          109.
          Refactor TezJob and TezLauncher Sub-task Closed Cheolsoo Park
          110.
          Make Streaming UDF work in Tez Sub-task Closed Daniel Dai
          111.
          ObjectCache cause ClassCastException Sub-task Closed Cheolsoo Park
          112.
          Change from TezJobConfig to TezRuntimeConfiguration Sub-task Closed Rohini Palaniswamy
          113.
          Accumulator UDF throws OOM in Tez Sub-task Closed Rohini Palaniswamy
          114.
          NPE in packager when union + group-by followed by replicated join in Tez Sub-task Closed Rohini Palaniswamy
          115.
          Implement merge cogroup in Tez Sub-task Closed Daniel Dai
          116.
          Add Native operator to tez Sub-task Closed Daniel Dai
          117.
          Create a target to run mr and tez unit test in one shot Sub-task Closed Daniel Dai
          118.
          Pin Tez to 0.5.0 release Sub-task Closed Cheolsoo Park
          119.
          Intermediate reducer parallelism in Tez should be higher Sub-task Closed Rohini Palaniswamy
          120.
          Mapreduce ACLs should be translated to Tez ACLs Sub-task Closed Rohini Palaniswamy
          121.
          Reset UDFContext state before OutputCommitter invocations in Tez Sub-task Closed Rohini Palaniswamy
          122.
          Fix few issues related to Union, CROSS and auto parallelism in Tez Sub-task Closed Rohini Palaniswamy
          123.
          PigProcessor does not set pig.datetime.default.tz Sub-task Closed Rohini Palaniswamy
          124.
          ObjectCache should use ProcessorContext.getObjectRegistry() Sub-task Closed Rohini Palaniswamy

          Activity

            People

              cheolsoo Cheolsoo Park
              cheolsoo Cheolsoo Park
              Votes:
              0 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: