Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4420

Support for map side cross similar to replicate join

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Our CROSS implementation is very costly. Recently had a case where a user was doing a CROSS of 30million records against 3K records and it caused lot of disk error exceptions during the shuffle phase. We need to add support for a map side cross syntax

      C = CROSS A, B using 'replicate';

      The smaller table can be loaded in a list (hashmap in replicate join) and iterated through for each record in the bigger table. It should give a major performance boost and drastically reduce the resource usage.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rohini Rohini Palaniswamy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: