Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-4542

OutputConsumerIterator should flush buffered records

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Pending Closed
    • Affects Version/s: spark-branch
    • Fix Version/s: spark-branch
    • Component/s: spark
    • Labels:
      None

      Description

      Certain operators may buffer the output. We need to flush the last set of records from such operators, when we encounter the last input record, before calling getNextTuple() for the last time.

      Currently, to flush the last set of records, we compute RDD.count() and compare the count with a running counter to determine if we have reached the last record. This is an unnecessary and inefficient.

        Attachments

        1. PIG-4542.patch
          14 kB
          Mohit Sabharwal
        2. PIG-4542.3.patch
          38 kB
          Mohit Sabharwal
        3. PIG-4542.2.patch
          38 kB
          Mohit Sabharwal
        4. PIG-4542.1.patch
          36 kB
          Mohit Sabharwal

          Issue Links

            Activity

              People

              • Assignee:
                mohitsabharwal Mohit Sabharwal
                Reporter:
                mohitsabharwal Mohit Sabharwal
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: