Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-4891

Implement FR join by broadcasting small rdd not making more copys of data

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • spark-branch
    • spark
    • None

    Description

      In current implementation of FRJoin(PIG-4771), we just set the value of replication of data as 10 to make the data access more efficiency because current FRJoin algrithms can be reused in this way. We need to figure out how to use broadcasting small rdd to implement FRJoin in current code base if we find the performance can be improved a lot by using broadcasting rdd.

      Attachments

        1. PIG-4891_2.patch
          23 kB
          Nándor Kollár

        Issue Links

          Activity

            People

              nkollar Nándor Kollár
              kellyzly liyunzhang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: