Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • tez
    • None

    Description

      e.g Group by followed by join on the same key
      This can be done in one vertex with multiple inputs instead of having an extra vertex to do the join. i.e Currently Vertex 1 (load relation1) > Vertex 2 (group by) -> Vertex 4 (join) < Vertex 3 (load relation 2). This could be changed to Vertex 1 (load relation1) > Vertex 2 (group by and join) < Vertex 3 (load relation 2)

      And idea of this kind of optimization from YSmart that hive already integrate it. Now pig has already integrate tez, so it would be natural to integrate YSmart into pig on tez.

      YSmart paper http://web.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
      YSmart slide http://www.slideshare.net/YinHuai/hive-correlation-optimizer?qid=1bb427af-e349-40c0-b9ad-46f508403879&v=default&b=&from_search=4

      Attachments

        Activity

          People

            Unassigned Unassigned
            rohini Rohini Palaniswamy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: