Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-978

ERROR 2100 (hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist) and ERROR 2999: (Unexpected internal error. null) when using Multi-Query optimization

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.6.0
    • 0.6.0
    • documentation
    • None

    Description

      I have Pig script of this form.. which I execute using Multi-query optimization.

      A = load '/user/viraj/firstinput' using PigStorage();
      B = group ....
      C = ..agrregation function
      store C into '/user/viraj/firstinputtempresult/days1';
      ..
      Atab = load '/user/viraj/secondinput' using PigStorage();
      Btab = group ....
      Ctab = ..agrregation function
      store Ctab into '/user/viraj/secondinputtempresult/days1';
      ..
      E = load '/user/viraj/firstinputtempresult/' using PigStorage();
      F = group 
      G = aggregation function
      store G into '/user/viraj/finalresult1';
      
      Etab = load '/user/viraj/secondinputtempresult/' using PigStorage();
      Ftab = group 
      Gtab = aggregation function
      store Gtab into '/user/viraj/finalresult2';
      

      2009-07-20 22:05:44,507 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2100: hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist. Details at logfile: /homes/viraj/pigscripts/pig_1248127173601.log)

      is due to the mismatch of store/load commands. The script first stores files into the 'days1' directory (store C into '/user/viraj/firstinputtempresult/days1' using PigStorage(), but it later loads from the top level directory (E = load '/user/viraj/firstinputtempresult/' using PigStorage()) instead of the original directory (/user/viraj/firstinputtempresult/days1).

      The current multi-query optimizer can't solve the dependency between these two commands--they have different load file paths. So the jobs will run concurrently and result in the errors.

      The solution is to add 'exec' or 'run' command after the first two stores . This will force the first two store commands to run before the rest commands.

      It would be nice to see this fixed as a part of an enhancement to the Multi-query. We either disable the Multi-query or throw a warning/error message, so that the user can correct his load/store statements.

      Viraj

      Attachments

        1. pig-latin-users-guide-2.patch
          3 kB
          Corinne Chandel
        2. pig-latin-users-guide.patch
          3 kB
          Corinne Chandel

        Activity

          People

            chandec Corinne Chandel
            viraj Viraj Bhat
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: