Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-4542

OutputConsumerIterator should flush buffered records

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Pending Closed
    • spark-branch
    • spark-branch
    • spark
    • None

    Description

      Certain operators may buffer the output. We need to flush the last set of records from such operators, when we encounter the last input record, before calling getNextTuple() for the last time.

      Currently, to flush the last set of records, we compute RDD.count() and compare the count with a running counter to determine if we have reached the last record. This is an unnecessary and inefficient.

      Attachments

        1. PIG-4542.1.patch
          36 kB
          Mohit Sabharwal
        2. PIG-4542.2.patch
          38 kB
          Mohit Sabharwal
        3. PIG-4542.3.patch
          38 kB
          Mohit Sabharwal
        4. PIG-4542.patch
          14 kB
          Mohit Sabharwal

        Issue Links

          Activity

            People

              mohitsabharwal Mohit Sabharwal
              mohitsabharwal Mohit Sabharwal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: