Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5313

Support PARALLEL in STORE statement

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • tez
    • None

    Description

      Restricting number of files in output is a very common use case. In Pig, currently users add a ORDER BY, GROUP BY or DISTINCT with the required parallelism before STORE to achieve it. All of the above operations create unnecessary overhead in processing. It would be ideal if STORE clause supported the PARALLEL statement and the partitioning of data was handled in a more simple and efficient manner.

      This jira is more Tez specific and requires TEZ-3865. More details are in that jira regarding how it can be done via Tez. We will also have to add APIs to StoreFunc (HCatStorer, MultiStorage, etc) to get partition keys to partition the data for store statement.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rohini Rohini Palaniswamy
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: