Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1152 Optimize broadcast join for scalability
  3. TEZ-1157

Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.1
    • None

    Description

      Currently tasks (belonging to same job) running in the same machine download its own copy of broadcast data. Optimization could be to download one copy in the machine, and the rest of the tasks can refer to this downloaded copy.

      (results after this feature)

      Attachments

        1. connections.png
          77 kB
          Gopal Vijayaraghavan
        2. latency.png
          67 kB
          Gopal Vijayaraghavan
        3. TEZ-1152.WIP.patch
          17 kB
          Rajesh Balamohan
        4. TEZ-1157.10.patch
          36 kB
          Gopal Vijayaraghavan
        5. TEZ-1157.3.WIP.patch
          33 kB
          Gopal Vijayaraghavan
        6. TEZ-1157.4.WIP.patch
          36 kB
          Gopal Vijayaraghavan
        7. TEZ-1157.5.WIP.patch
          35 kB
          Gopal Vijayaraghavan
        8. TEZ-1157.6.patch
          37 kB
          Gopal Vijayaraghavan
        9. TEZ-1157.7.patch
          35 kB
          Gopal Vijayaraghavan
        10. TEZ-1157.8.patch
          35 kB
          Gopal Vijayaraghavan
        11. TEZ-1157.9.patch
          35 kB
          Gopal Vijayaraghavan
        12. TEZ-broadcast-shuffle+vertex-parallelism.patch
          14 kB
          Gopal Vijayaraghavan

        Issue Links

          Activity

            People

              gopalv Gopal Vijayaraghavan
              rajesh.balamohan Rajesh Balamohan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: