Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5323

Implement LastInputStreamingOptimizer in Tez

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • 0.18.0
    • None
    • None

    Description

      http://pig.apache.org/docs/r0.17.0/perf.html#join-optimizations

      Optimization for regular joins ensures that the last table in the join is not brought into memory but streamed through instead. Optimization reduces the amount of memory used which means you can avoid spilling the data and also should be able to scale your query to larger data volumes.

      To take advantage of this optimization, make sure that the table with the largest number of tuples per key is the last table in your query. In some of our tests we saw 10x performance improvement as the result of this optimization.

      We are not doing that in Tez and both the tables are materialized as InternalCachedBag.

      Attachments

        1. PIG-5323-1.patch
          128 kB
          Rohini Palaniswamy

        Issue Links

          Activity

            People

              rohini Rohini Palaniswamy
              rohini Rohini Palaniswamy
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: