Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10302

Load small tables (for map join) in executor memory only once [Spark Branch]

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0, 2.0.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor.

        Attachments

        1. HIVE-10302.spark-1.patch
          8 kB
          Jimmy Xiang
        2. HIVE-10302.2-spark.patch
          8 kB
          Jimmy Xiang
        3. HIVE-10302.3-spark.patch
          7 kB
          Jimmy Xiang
        4. HIVE-10302.4-spark.patch
          9 kB
          Jimmy Xiang
        5. 10302.patch
          9 kB
          Xuefu Zhang

          Issue Links

            Activity

              People

              • Assignee:
                jxiang Jimmy Xiang
                Reporter:
                jxiang Jimmy Xiang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: