Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4856

Optimization for pig on spark

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: spark-branch
    • Component/s: spark
    • Labels:
      None

      Description

      As we finished unit test fixing for milestone 2, we will enter milestone3 about the optimization for the project.

        Attachments

        1.
        Implement FR Join for spark engine Sub-task Closed liyunzhang
        2.
        Implement Merge join for spark engine Sub-task Closed Xianda Ke
        3.
        Optimization for join/group case for spark mode Sub-task Closed liyunzhang
        4.
        Use pigmix to test the performance of pig on spark Sub-task Open liyunzhang
        5.
        MultiQueryOptimizerSpark doesn't remove all redudant nodes in spark plan Sub-task Open liyunzhang
        6.
        Implement Skewed join for spark engine Sub-task Closed Xianda Ke
        7.
        Re-design Spark plan to optimize the RDD pipeline Sub-task Open Unassigned
        8.
        Not use OperatorPlan#forceConnect in MultiQueryOptimizationSpark Sub-task Closed liyunzhang
        9.
        Implement secondary sort using one shuffle Sub-task Closed liyunzhang
        10.
        Run pigmix on spark on yarn with multiple nodes Sub-task Resolved Unassigned
        11.
        Calculate the value of parallism for spark mode Sub-task Closed liyunzhang
        12.
        Optimize combine case for spark mode Sub-task Closed liyunzhang
        13.
        Remove the deserialize and serialization of JobConf in code for spark mode Sub-task Closed liyunzhang
        14.
        add a physical operator to broadcast small RDDs Sub-task Closed Xianda Ke
        15.
        Optimize sort case when data is skewed Sub-task Patch Available liyunzhang
        16.
        Create SparkCompiler#getSamplingJob in spark mode Sub-task Closed liyunzhang
        17.
        support outer join for skewedjoin in spark mode Sub-task Closed Xianda Ke
        18.
        Initialize PigContants.TASK_INDEX in spark mode correctly Sub-task Closed liyunzhang
        19.
        Initialize MRConfiguration.JOB_ID in spark mode correctly Sub-task Closed Ádám Szita
        20.
        Initialize SchemaTupleBackend correctly in backend in spark mode if spark job has more than 1 stage Sub-task Closed Ádám Szita
        21.
        Set SPARK_REDUCERS by pig.properties not by system configuration Sub-task Closed liyunzhang
        22.
        Fix TestPigRunner.simpleMultiQueryTest3 unit test failure Sub-task Closed Nándor Kollár
        23.
        Enable persist/cache mechanism in Pig Sub-task Open Xianda Ke
        24.
        Support outer join for SkewedJoin in spark mode Sub-task Resolved Xianda Ke

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              kellyzly liyunzhang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: