Pig
  1. Pig
  2. PIG-2699

Reduce the number of instances of Load and Store Funcs down to 2+1. It should be 1 in the front-end and 1 in the backend

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11
    • Component/s: internal-udfs
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Attached: a patch to get it down to 3
      Here is the report of the remaining calls.
      some methods are unnecessarily called multiple times, this should be improved as well.

      A = LOAD 'foo' USING TestLoadStoreFuncLifeCycle$Loader();
      STORE A INTO 'bar' USING TestLoadStoreFuncLifeCycle$Storer();
      
      
      report:
      3 instances of Loader
      20 calls to Loader
      3 instances of Storer
      24 calls to Storer
      
      all calls:
      Loader[1].<init>()
      Loader[1].relativeToAbsolutePath(foo, file:/Users/julien/svn/pig/trunk-LoadStoreFunc-lifecycle)
      Loader[1].setUDFContextSignature(A_1-0)
      Loader[1].getSchema(foo, org.apache.hadoop.mapreduce.Job@7ee49dcd)
      Storer[1].<init>()
      Storer[1].setStoreFuncUDFContextSignature(A_1-1)
      Storer[1].relToAbsPathForStoreLocation(bar, file:/Users/julien/svn/pig/trunk-LoadStoreFunc-lifecycle)
      Storer[1].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@776be68f)
      Storer[1].getOutputFormat()
      Loader[1].getStatistics(foo, org.apache.hadoop.mapreduce.Job@11e9c82e)
      Loader[1].setLocation(foo, org.apache.hadoop.mapreduce.Job@11e9c82e)
      Storer[1].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@57d840cd)
      Storer[2].<init>()
      Storer[2].setStoreFuncUDFContextSignature(A_1-1)
      Storer[2].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@76996cca)
      Storer[2].getOutputFormat()
      Loader[2].<init>()
      Loader[2].setUDFContextSignature(A_1-0)
      Loader[2].setLocation(foo, org.apache.hadoop.mapreduce.Job@317cfd38)
      Loader[2].getInputFormat()
      Storer[3].<init>()
      Storer[3].setStoreFuncUDFContextSignature(A_1-1)
      Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@459d3b3a)
      Storer[3].getOutputFormat()
      Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@225f1ae9)
      Loader[3].<init>()
      Loader[3].setUDFContextSignature(A_1-0)
      Loader[3].setLocation(foo, org.apache.hadoop.mapreduce.Job@6b98e8b4)
      Loader[3].getInputFormat()
      Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@5fb11b79)
      Storer[3].getOutputFormat()
      Storer[3].prepareToWrite(org.apache.pig.builtin.mock.Storage$MockRecordWriter@49b09282)
      Loader[3].setUDFContextSignature(A_1-0)
      Loader[3].prepareToRead(org.apache.pig.builtin.mock.Storage$MockRecordReader@2c8c7d6, Number of splits :1...)
      Loader[3].getNext()
      Storer[3].putNext((a))
      Loader[3].getNext()
      Storer[3].putNext((b))
      Loader[3].getNext()
      Storer[3].putNext((c))
      Loader[3].getNext()
      Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@3ebfbbe3)
      Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@14d964af)
      Storer[1].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@644ca6b6)
      
      constructor calls:
      Loader[1].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:426)
      org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3170)
      org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1293)
      org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
      org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
      org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
      org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
      org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1602)
      org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549)
      org.apache.pig.PigServer.registerQuery(PigServer.java:534)
      org.apache.pig.PigServer.registerQuery(PigServer.java:547)
      Storer[1].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.parser.LogicalPlanBuilder.buildStoreOp(LogicalPlanBuilder.java:486)
      org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:6336)
      org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1337)
      org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
      org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
      org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
      org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
      org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1602)
      org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549)
      org.apache.pig.PigServer.registerQuery(PigServer.java:534)
      org.apache.pig.PigServer.registerQuery(PigServer.java:547)
      Storer[2].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:232)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:168)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:200)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
      org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157)
      org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:258)
      Loader[2].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:254)
      org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157)
      org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:258)
      Storer[3].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:232)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:84)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.<init>(PigOutputCommitter.java:66)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279)
      Loader[3].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:158)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:106)
      

      In trunk this was:

      5 instances of Loader
      31 calls to Loader
      6 instances of Storer
      30 calls to Storer
      
      all calls:
      Loader[1].<init>()
      Loader[2].<init>()
      Loader[2].relativeToAbsolutePath(foo, file:/Users/julien/svn/pig/trunk)
      Storer[1].<init>()
      Storer[2].<init>()
      Storer[2].setStoreFuncUDFContextSignature(A_bar_org.apache.pig.TestLoadStoreFuncLifeCycle$Storer)
      Storer[2].relToAbsPathForStoreLocation(bar, file:/Users/julien/svn/pig/trunk)
      Storer[3].<init>()
      Storer[3].setStoreFuncUDFContextSignature(1-0_bar_org.apache.pig.TestLoadStoreFuncLifeCycle$Storer)
      Loader[3].<init>()
      Loader[3].setUDFContextSignature(A)
      Loader[3].getSchema(foo, org.apache.hadoop.mapreduce.Job@4c349471)
      Loader[3].getSchema(foo, org.apache.hadoop.mapreduce.Job@24c0f1ec)
      Loader[3].getSchema(foo, org.apache.hadoop.mapreduce.Job@900bac2)
      Loader[3].getSchema(foo, org.apache.hadoop.mapreduce.Job@635aed57)
      Loader[3].getSchema(foo, org.apache.hadoop.mapreduce.Job@2d7cec96)
      Loader[3].getSchema(foo, org.apache.hadoop.mapreduce.Job@4b947496)
      Storer[3].setStoreFuncUDFContextSignature(A_bar_org.apache.pig.TestLoadStoreFuncLifeCycle$Storer)
      Loader[3].getSchema(foo, org.apache.hadoop.mapreduce.Job@776be68f)
      Loader[3].getSchema(foo, org.apache.hadoop.mapreduce.Job@560c3014)
      Loader[3].getSchema(foo, org.apache.hadoop.mapreduce.Job@5773ec72)
      Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@bb273cc)
      Storer[3].getOutputFormat()
      Loader[3].getSchema(foo, org.apache.hadoop.mapreduce.Job@45660d6)
      Loader[3].setLocation(foo, org.apache.hadoop.mapreduce.Job@d2368df)
      Loader[3].getStatistics(foo, org.apache.hadoop.mapreduce.Job@d2368df)
      Storer[4].<init>()
      Storer[4].setStoreFuncUDFContextSignature(A_bar_org.apache.pig.TestLoadStoreFuncLifeCycle$Storer)
      Storer[4].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@78ff9053)
      Storer[5].<init>()
      Storer[5].setStoreFuncUDFContextSignature(A_bar_org.apache.pig.TestLoadStoreFuncLifeCycle$Storer)
      Storer[5].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@336d8196)
      Storer[5].getOutputFormat()
      Loader[4].<init>()
      Loader[4].setUDFContextSignature(A)
      Loader[4].setLocation(foo, org.apache.hadoop.mapreduce.Job@61250ff2)
      Loader[4].getInputFormat()
      Storer[6].<init>()
      Storer[6].setStoreFuncUDFContextSignature(A_bar_org.apache.pig.TestLoadStoreFuncLifeCycle$Storer)
      Storer[6].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@604788d5)
      Storer[6].getOutputFormat()
      Storer[6].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@7f342545)
      Loader[5].<init>()
      Loader[5].setUDFContextSignature(A)
      Loader[5].setLocation(foo, org.apache.hadoop.mapreduce.Job@459d3b3a)
      Loader[5].getInputFormat()
      Storer[6].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@795e0c2b)
      Storer[6].getOutputFormat()
      Storer[6].prepareToWrite(org.apache.pig.builtin.mock.Storage$MockRecordWriter@7c34151f)
      Loader[5].setUDFContextSignature(A)
      Loader[5].prepareToRead(org.apache.pig.builtin.mock.Storage$MockRecordReader@62114b17, Number of splits :1...)
      Loader[5].getNext()
      Storer[6].putNext((a))
      Loader[5].getNext()
      Storer[6].putNext((b))
      Loader[5].getNext()
      Storer[6].putNext((c))
      Loader[5].getNext()
      Storer[6].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@bf47ae8)
      Storer[6].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@4bb7b407)
      Storer[4].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@3cee6ad6)
      
      constructor calls:
      Loader[1].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:791)
      org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:780)
      org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:4670)
      org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3117)
      org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1293)
      org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
      org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
      org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
      org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
      org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1602)
      org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549)
      org.apache.pig.PigServer.registerQuery(PigServer.java:534)
      org.apache.pig.PigServer.registerQuery(PigServer.java:547)
      Loader[2].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.parser.LogicalPlanBuilder.getAbolutePathForLoad(LogicalPlanBuilder.java:417)
      org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:436)
      org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3170)
      org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1293)
      org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
      org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
      org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
      org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
      org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1602)
      org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549)
      org.apache.pig.PigServer.registerQuery(PigServer.java:534)
      org.apache.pig.PigServer.registerQuery(PigServer.java:547)
      Storer[1].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:791)
      org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:780)
      org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:4670)
      org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:6312)
      org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1337)
      org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
      org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
      org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
      org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
      org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1602)
      org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549)
      org.apache.pig.PigServer.registerQuery(PigServer.java:534)
      org.apache.pig.PigServer.registerQuery(PigServer.java:547)
      Storer[2].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.parser.LogicalPlanBuilder.getAbolutePathForStore(LogicalPlanBuilder.java:478)
      org.apache.pig.parser.LogicalPlanBuilder.buildStoreOp(LogicalPlanBuilder.java:499)
      org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:6336)
      org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1337)
      org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
      org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
      org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
      org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
      org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1602)
      org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549)
      org.apache.pig.PigServer.registerQuery(PigServer.java:534)
      org.apache.pig.PigServer.registerQuery(PigServer.java:547)
      Storer[3].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.newplan.logical.relational.LOStore.<init>(LOStore.java:55)
      org.apache.pig.parser.LogicalPlanBuilder.buildStoreOp(LogicalPlanBuilder.java:505)
      org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:6336)
      org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1337)
      org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
      org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
      org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
      org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
      org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1602)
      org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549)
      org.apache.pig.PigServer.registerQuery(PigServer.java:534)
      org.apache.pig.PigServer.registerQuery(PigServer.java:547)
      Loader[3].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.newplan.logical.relational.LOLoad.getLoadFunc(LOLoad.java:77)
      org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:149)
      org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
      org.apache.pig.newplan.logical.relational.LOStore.getSchema(LOStore.java:68)
      org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60)
      org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:84)
      org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
      org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
      org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
      org.apache.pig.PigServer$Graph.compile(PigServer.java:1630)
      org.apache.pig.PigServer$Graph.compile(PigServer.java:1624)
      org.apache.pig.PigServer$Graph.access$2(PigServer.java:1623)
      org.apache.pig.PigServer.execute(PigServer.java:1246)
      org.apache.pig.PigServer.access$0(PigServer.java:1237)
      org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1556)
      Storer[4].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:232)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:499)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:281)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:178)
      org.apache.pig.PigServer.launchPlan(PigServer.java:1279)
      org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1264)
      org.apache.pig.PigServer.execute(PigServer.java:1254)
      org.apache.pig.PigServer.access$0(PigServer.java:1237)
      org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1556)
      org.apache.pig.PigServer.registerQuery(PigServer.java:534)
      org.apache.pig.PigServer.registerQuery(PigServer.java:547)
      Storer[5].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:232)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:168)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:200)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
      org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157)
      org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:258)
      Loader[4].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:254)
      org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157)
      org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:258)
      Storer[6].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:232)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:84)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.<init>(PigOutputCommitter.java:66)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279)
      Loader[5].<init> called by 
      org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:158)
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:106)
      
      1. PIG-2699_a.patch
        32 kB
        Julien Le Dem
      2. PIG-2699_b.patch
        33 kB
        Julien Le Dem
      3. PIG-2699_c.patch
        81 kB
        Julien Le Dem
      4. PIG-2699_d.patch
        110 kB
        Julien Le Dem
      5. PIG-2699_e.patch
        110 kB
        Julien Le Dem
      6. PIG-2699_f.patch
        110 kB
        Julien Le Dem
      7. PIG-2699.patch
        25 kB
        Julien Le Dem

        Issue Links

          Activity

          Hide
          Julien Le Dem added a comment -

          Hi Koji,
          Yes this is "new Configuration()" picking up the config file generated for minicluster in the previous test.
          I'll fix the test

          Show
          Julien Le Dem added a comment - Hi Koji, Yes this is "new Configuration()" picking up the config file generated for minicluster in the previous test. I'll fix the test
          Hide
          Koji Noguchi added a comment -

          This is indeed related.

          TestUDFCOntext was fixed by PIG-2809

          TestGrunt will be fixed by PIG-2820

          Thanks Julien! Only one more unit test remamining.
          TestNewPlanOperatorPlan.testRelationalSameOpDifferentPreds
          Is this also related?

          Show
          Koji Noguchi added a comment - This is indeed related. TestUDFCOntext was fixed by PIG-2809 TestGrunt will be fixed by PIG-2820 Thanks Julien! Only one more unit test remamining. TestNewPlanOperatorPlan.testRelationalSameOpDifferentPreds Is this also related?
          Hide
          Julien Le Dem added a comment -

          Hi Koji
          This is indeed related.
          TestUDFCOntext was fixed by PIG-2809
          TestGrunt will be fixed by PIG-2820

          Show
          Julien Le Dem added a comment - Hi Koji This is indeed related. TestUDFCOntext was fixed by PIG-2809 TestGrunt will be fixed by PIG-2820
          Hide
          Koji Noguchi added a comment -

          I haven't looked into TestUDFContext, TestGrunt yet

          org.apache.pig.test.TestGrunt.testCD and
          org.apache.pig.test.TestGrunt.testFsCommand failing for me. Is this related?

          Show
          Koji Noguchi added a comment - I haven't looked into TestUDFContext, TestGrunt yet org.apache.pig.test.TestGrunt.testCD and org.apache.pig.test.TestGrunt.testFsCommand failing for me. Is this related?
          Hide
          Julien Le Dem added a comment -

          see PIG-2807
          I haven't looked into TestUDFContext, TestGrunt yet

          Show
          Julien Le Dem added a comment - see PIG-2807 I haven't looked into TestUDFContext, TestGrunt yet
          Hide
          Julien Le Dem added a comment -

          I have a patch for those. Will submit it soon.

          Show
          Julien Le Dem added a comment - I have a patch for those. Will submit it soon.
          Hide
          Daniel Dai added a comment -

          Julien, how about other failures:
          TestNewPlanOperatorPlan, TestParser, TestPigStorage, TestUDFContext, TestGrunt.

          Are you still working on it?

          Show
          Daniel Dai added a comment - Julien, how about other failures: TestNewPlanOperatorPlan, TestParser, TestPigStorage, TestUDFContext, TestGrunt. Are you still working on it?
          Hide
          Julien Le Dem added a comment -

          see PIG-2790 for TestLOLoadDeterminedSchema

          Show
          Julien Le Dem added a comment - see PIG-2790 for TestLOLoadDeterminedSchema
          Hide
          Daniel Dai added a comment -

          TestNewPlanOperatorPlan, TestParser, TestPigStorage as well.

          Show
          Daniel Dai added a comment - TestNewPlanOperatorPlan, TestParser, TestPigStorage as well.
          Hide
          Daniel Dai added a comment -

          Seems TestLOLoadDeterminedSchema is broken with the patch. Julien, Do you have time to take a look?

          Show
          Daniel Dai added a comment - Seems TestLOLoadDeterminedSchema is broken with the patch. Julien, Do you have time to take a look?
          Hide
          Julien Le Dem added a comment -

          PIG-2699_f.patch fixing TestLoad error

          Show
          Julien Le Dem added a comment - PIG-2699 _f.patch fixing TestLoad error
          Hide
          Julien Le Dem added a comment -

          PIG-2699_e.patch
          change signature in helper method for consistency

          Show
          Julien Le Dem added a comment - PIG-2699 _e.patch change signature in helper method for consistency
          Hide
          Julien Le Dem added a comment -

          PIG-2699_d.patch minor changes.
          prefix signatures with alias.
          separate Test helper method

          Show
          Julien Le Dem added a comment - PIG-2699 _d.patch minor changes. prefix signatures with alias. separate Test helper method
          Hide
          Ashutosh Chauhan added a comment -

          +1 thanks for doing this Julien! I was waiting since ages for someone to pick up this tricky but highly useful refactoring n cleanup.

          Show
          Ashutosh Chauhan added a comment - +1 thanks for doing this Julien! I was waiting since ages for someone to pick up this tricky but highly useful refactoring n cleanup.
          Hide
          Jeremy Hanna added a comment -

          Nice - we've seen many instances where we just know that methods are called a bunch of times and one of those times, the argument will be non-null. The trick has been when it's not null, keep hold of it! Thanks for doing all of this!

          Show
          Jeremy Hanna added a comment - Nice - we've seen many instances where we just know that methods are called a bunch of times and one of those times, the argument will be non-null. The trick has been when it's not null, keep hold of it! Thanks for doing all of this!
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/5159/
          -----------------------------------------------------------

          Review request for pig.

          Summary
          -------

          Pig creates too many instances of Load and Store Funcs. It should be 1 in the front-end and 1 in the backend

          This addresses bug PIG-2699.
          https://issues.apache.org/jira/browse/PIG-2699

          Diffs


          /trunk/src/org/apache/pig/PigServer.java 1339793
          /trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1339793
          /trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1339793
          /trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStore.java 1339793
          /trunk/src/org/apache/pig/newplan/logical/relational/LOLoad.java 1339793
          /trunk/src/org/apache/pig/newplan/logical/relational/LOStore.java 1339793
          /trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1339793
          /trunk/src/org/apache/pig/newplan/logical/rules/LoadStoreFuncDupSignatureValidator.java 1339793
          /trunk/src/org/apache/pig/newplan/logical/rules/PartitionFilterOptimizer.java 1339793
          /trunk/src/org/apache/pig/newplan/logical/rules/TypeCastInserter.java 1339793
          /trunk/src/org/apache/pig/newplan/logical/visitor/ScalarVisitor.java 1339793
          /trunk/src/org/apache/pig/parser/FunctionType.java 1339793
          /trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1339793
          /trunk/src/org/apache/pig/parser/QueryParserUtils.java 1339793
          /trunk/test/org/apache/pig/TestLoadStoreFuncLifeCycle.java PRE-CREATION
          /trunk/test/org/apache/pig/parser/TestScalarVisitor.java 1339793
          /trunk/test/org/apache/pig/test/TestInputOutputFileValidator.java 1339793
          /trunk/test/org/apache/pig/test/TestLogToPhyCompiler.java 1339793
          /trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1339793
          /trunk/test/org/apache/pig/test/TestMRCompiler.java 1339793
          /trunk/test/org/apache/pig/test/TestNewPlanFilterRule.java 1339793
          /trunk/test/org/apache/pig/test/TestNewPlanListener.java 1339793
          /trunk/test/org/apache/pig/test/Util.java 1339793

          Diff: https://reviews.apache.org/r/5159/diff

          Testing
          -------

          test-commit + new test

          Thanks,

          Julien

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5159/ ----------------------------------------------------------- Review request for pig. Summary ------- Pig creates too many instances of Load and Store Funcs. It should be 1 in the front-end and 1 in the backend This addresses bug PIG-2699 . https://issues.apache.org/jira/browse/PIG-2699 Diffs /trunk/src/org/apache/pig/PigServer.java 1339793 /trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1339793 /trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1339793 /trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStore.java 1339793 /trunk/src/org/apache/pig/newplan/logical/relational/LOLoad.java 1339793 /trunk/src/org/apache/pig/newplan/logical/relational/LOStore.java 1339793 /trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1339793 /trunk/src/org/apache/pig/newplan/logical/rules/LoadStoreFuncDupSignatureValidator.java 1339793 /trunk/src/org/apache/pig/newplan/logical/rules/PartitionFilterOptimizer.java 1339793 /trunk/src/org/apache/pig/newplan/logical/rules/TypeCastInserter.java 1339793 /trunk/src/org/apache/pig/newplan/logical/visitor/ScalarVisitor.java 1339793 /trunk/src/org/apache/pig/parser/FunctionType.java 1339793 /trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1339793 /trunk/src/org/apache/pig/parser/QueryParserUtils.java 1339793 /trunk/test/org/apache/pig/TestLoadStoreFuncLifeCycle.java PRE-CREATION /trunk/test/org/apache/pig/parser/TestScalarVisitor.java 1339793 /trunk/test/org/apache/pig/test/TestInputOutputFileValidator.java 1339793 /trunk/test/org/apache/pig/test/TestLogToPhyCompiler.java 1339793 /trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1339793 /trunk/test/org/apache/pig/test/TestMRCompiler.java 1339793 /trunk/test/org/apache/pig/test/TestNewPlanFilterRule.java 1339793 /trunk/test/org/apache/pig/test/TestNewPlanListener.java 1339793 /trunk/test/org/apache/pig/test/Util.java 1339793 Diff: https://reviews.apache.org/r/5159/diff Testing ------- test-commit + new test Thanks, Julien
          Hide
          Julien Le Dem added a comment -
          Show
          Julien Le Dem added a comment - See review: https://reviews.apache.org/r/5159/
          Hide
          Julien Le Dem added a comment -

          PIG-2699_c.patch

          • The setContextSignatures methods being called multiple times with different values I changed the logic so that the signature is unique from the begining and can not be modified.
          • I refactored one of the unit test which had a lot of duplication and needed to be modified to be more stable.
          • The getSchema() method was called multiple time because of a caching mechanism that did not handle null. I changed for a single initialization at the begining.
          Show
          Julien Le Dem added a comment - PIG-2699 _c.patch The setContextSignatures methods being called multiple times with different values I changed the logic so that the signature is unique from the begining and can not be modified. I refactored one of the unit test which had a lot of duplication and needed to be modified to be more stable. The getSchema() method was called multiple time because of a caching mechanism that did not handle null. I changed for a single initialization at the begining.
          Hide
          Julien Le Dem added a comment -

          PIG-2699_b.patch
          with Apache headers

          Show
          Julien Le Dem added a comment - PIG-2699 _b.patch with Apache headers
          Hide
          Julien Le Dem added a comment -

          PIG-2699_a.patch
          addresses the comments

          Show
          Julien Le Dem added a comment - PIG-2699 _a.patch addresses the comments
          Hide
          Dmitriy V. Ryaboy added a comment -

          Thanks for cleaning this up.

          Please add Apache headers to new file.

          Minor nits:

          Use log4j instead of printlns in the test.. in fact, why are there printlns in the test? Just to get the report? Should that be a separate utility?

          Looks like indentation might be somewhat broken in a few places:

                   if (absolutePath == null) {
          -            absolutePath = stoFunc.relToAbsPathForStoreLocation( filename, 
          +                absolutePath = stoFunc.relToAbsPathForStoreLocation(
          
          Show
          Dmitriy V. Ryaboy added a comment - Thanks for cleaning this up. Please add Apache headers to new file. Minor nits: Use log4j instead of printlns in the test.. in fact, why are there printlns in the test? Just to get the report? Should that be a separate utility? Looks like indentation might be somewhat broken in a few places: if (absolutePath == null ) { - absolutePath = stoFunc.relToAbsPathForStoreLocation( filename, + absolutePath = stoFunc.relToAbsPathForStoreLocation(
          Hide
          Julien Le Dem added a comment -

          PIG-2699.patch:
          test/org/apache/pig/TestLoadStoreFuncLifeCycle.java
          generates the report in the description.
          The patch removes some of the instatiations:

          • type checking: can be done with the class without creating a new instance
          • LogicalPlanBuilder creates an instance to convert relative to absolute path and then creates a LogicalOperator that instantiates a new one. I changed to reuse the one just created

          left to do:
          the PigInputFormat (sam for Output format) get the PhysicalOperator by deserializing form the conf. There should be a lookup mechanism to get the instance of Load/StoreFunc from a registry (by LogicalOperator signature?)

          Show
          Julien Le Dem added a comment - PIG-2699 .patch: test/org/apache/pig/TestLoadStoreFuncLifeCycle.java generates the report in the description. The patch removes some of the instatiations: type checking: can be done with the class without creating a new instance LogicalPlanBuilder creates an instance to convert relative to absolute path and then creates a LogicalOperator that instantiates a new one. I changed to reuse the one just created left to do: the PigInputFormat (sam for Output format) get the PhysicalOperator by deserializing form the conf. There should be a lookup mechanism to get the instance of Load/StoreFunc from a registry (by LogicalOperator signature?)

            People

            • Assignee:
              Julien Le Dem
              Reporter:
              Julien Le Dem
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development