Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-627

PERFORMANCE: multi-query optimization

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.3.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently, if your Pig script contains multiple stores and some shared computation, Pig will execute several independent queries. For instance:

      A = load 'data' as (a, b, c);
      B = filter A by a > 5;
      store B into 'output1';
      C = group B by b;
      store C into 'output2';

      This script will result in map-only job that generated output1 followed by a map-reduce job that generated output2. As the resuld data is read, parsed and filetered twice which is unnecessary and costly.

        Attachments

        1. multiquery-phase3_0423.patch
          77 kB
          Richard Ding
        2. error_handling_0416.patch
          27 kB
          Gunther Hagleitner
        3. error_handling_0415.patch
          27 kB
          Gunther Hagleitner
        4. doc-fix.patch
          5 kB
          Gunther Hagleitner
        5. merge-041409.patch
          21 kB
          Gunther Hagleitner
        6. streaming-fix.patch
          10 kB
          Gunther Hagleitner
        7. merge_trunk_to_branch.patch
          13 kB
          Gunther Hagleitner
        8. non_reversible_store_load_dependencies_2.patch
          90 kB
          Gunther Hagleitner
        9. non_reversible_store_load_dependencies.patch
          76 kB
          Gunther Hagleitner
        10. noop_filter_absolute_path_flag_0401.patch
          125 kB
          Gunther Hagleitner
        11. noop_filter_absolute_path_flag.patch
          88 kB
          Gunther Hagleitner
        12. fix_store_prob.patch
          26 kB
          Gunther Hagleitner
        13. merge_741727_HEAD__0324_2.patch
          595 kB
          Gunther Hagleitner
        14. merge_741727_HEAD__0324.patch
          591 kB
          Gunther Hagleitner
        15. multiquery-phase2_0323.patch
          88 kB
          Richard Ding
        16. multiquery_explain_fix.patch
          3 kB
          Gunther Hagleitner
        17. multiquery-phase2_0313.patch
          86 kB
          Richard Ding
        18. multiquery_0306.patch
          32 kB
          Richard Ding
        19. file_cmds-0305.patch
          33 kB
          Gunther Hagleitner
        20. multi-store-0304.patch
          78 kB
          Gunther Hagleitner
        21. multi-store-0303.patch
          77 kB
          Gunther Hagleitner
        22. multiquery_0224.patch
          146 kB
          Gunther Hagleitner
        23. multiquery_0223.patch
          110 kB
          Gunther Hagleitner

          Activity

            People

            • Assignee:
              hagleitn Gunther Hagleitner
              Reporter:
              olgan Olga Natkovich
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: