Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2497

Order of execution of fs, store and sh commands in Pig is not maintained

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.1
    • Fix Version/s: 0.10.0, 0.9.3, 0.11
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I have a pig script like this :
      --Load data, process it and store to two outputs

      a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
      b = group a by (cookie);
      c = foreach b generate group, COUNT_STAR(a);
      store c into '$COUNT_OUTPUT' using PigStorage();
      store b into '$GRID_OUTPUT' using PigStorage();
      --Remove local file, copy to local and remove processed file from grid
      sh rm -rf '$LOCAL_OUTPUT';
      fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
      fs -rmr '$GRID_OUTPUT';
      

      Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.

      Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh" commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.

      Thanks
      Viraj

        Attachments

        1. PIG-2497-1.patch
          4 kB
          Daniel Dai

          Activity

            People

            • Assignee:
              daijy Daniel Dai
              Reporter:
              viraj Viraj Bhat
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: