Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5313

Support PARALLEL in STORE statement

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: tez
    • Labels:
      None

      Description

      Restricting number of files in output is a very common use case. In Pig, currently users add a ORDER BY, GROUP BY or DISTINCT with the required parallelism before STORE to achieve it. All of the above operations create unnecessary overhead in processing. It would be ideal if STORE clause supported the PARALLEL statement and the partitioning of data was handled in a more simple and efficient manner.

      This jira is more Tez specific and requires TEZ-3865. More details are in that jira regarding how it can be done via Tez. We will also have to add APIs to StoreFunc (HCatStorer, MultiStorage, etc) to get partition keys to partition the data for store statement.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rohini Rohini Palaniswamy
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: