Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-96

It should be possible to spill big databags to HDFS

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.2.0
    • data
    • None

    Description

      Currently databags only get spilled to local disk which costs 2 disk io operations.If databags are too big, this is not efficient.
      We should take advantage of HDFS so if the databag is too big (determined by DataBag.getMemorySize() > a big threshold), let's spill it to HDFS. Also read from HDFS in parallel when data is required.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pi_song Pi Song
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: