Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-7384

Research into reduce-side join [Spark Branch]

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: Spark
    • Labels:
      None

      Description

      Hive's join operator is very sophisticated, especially for reduce-side join. While we expect that other types of join, such as map-side join and SMB map-side join, will work out of the box with our design, there may be some complication in reduce-side join, which extensively utilizes key tag and shuffle behavior. Our design principle prefers to making Hive implementation work out of box also, which might requires new functionality from Spark. The tasks is to research into this area, identifying requirements for Spark community and the work to be done on Hive to make reduce-side join work.

      A design doc might be needed for this. For more information, please refer to the overall design doc on wiki.

        Attachments

        1. sales_items.txt
          0.2 kB
          Brock Noland
        2. sales_products.txt
          0.0 kB
          Brock Noland
        3. sales_stores.txt
          0.0 kB
          Brock Noland
        4. Hive on Spark Reduce Side Join.docx
          112 kB
          Szehon Ho

          Issue Links

            Activity

              People

              • Assignee:
                szehon Szehon Ho
                Reporter:
                xuefuz Xuefu Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: