Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8699

Enable support for common map join [Spark Branch]

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Spark
    • None

    Description

      This JIRA is to track issues related to common map-join support in Spark, including logical and physical optimizations. HIVE-8616 provided initialial processing, mainly represented by SparkMapJoinOptimizer. We need to continue the work to make map join work from end to end, including enhancement needed for SparkMapJoinOptimizer and subsequent physical optimization SparkMapJoinResolver.

      Attachments

        Issue Links

          1.
          Implement HashTableLoader for Spark map-join [Spark Branch] Sub-task Resolved Jimmy Xiang
          2.
          Dump small table join data for map-join [Spark Branch] Sub-task Resolved Jimmy Xiang
          3.
          Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch] Sub-task Resolved Chao Sun
          4.
          Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch] Sub-task Resolved Suhas Satish
          5.
          Combine nested map joins into the parent map join if possible [Spark Branch] Sub-task Resolved Szehon Ho
          6.
          Extra MapTask created but not connected [Spark Branch] Sub-task Resolved Szehon Ho
          7.
          Refactoring: move mapLocalWork field from MapWork to BaseWork Sub-task Resolved Xuefu Zhang
          8.
          Generate MapredLocalWork in SparkMapJoinResolver [Spark Brach] Sub-task Resolved Chao Sun
          9.
          Refactor to make splitting SparkWork a physical resolver [Spark Branch] Sub-task Resolved Rui Li
          10.
          Make HashTableSinkOperator works for Spark Branch [Spark Branch] Sub-task Resolved Jimmy Xiang
          11.
          Make RDD caching work for multi-insert after HIVE-8793 when map join is involved [Spark Branch] Sub-task Resolved Rui Li
          12.
          auto_join2.q produces incorrect tree [Spark Branch] Sub-task Resolved Chao Sun
          13.
          Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch] Sub-task Open Jimmy Xiang
          14.
          ColumnStatsTask fails because of SparkMapJoinResolver [Spark Branch] Sub-task Resolved Chao Sun
          15.
          Populate ExecMapperContext in SparkReduceRecordHandler [Spark Branch] Sub-task Resolved Chao Sun
          16.
          Needs to set hashTableMemoryUsage for MapJoinDesc [Spark Branch] Sub-task Resolved Chao Sun
          17.
          Investigate test failure on mapjoin_filter_on_outerjoin.q [Spark Branch] Sub-task Resolved Chao Sun
          18.
          Investigate test failures on auto_join30.q [Spark Branch] Sub-task Resolved Chao Sun
          19.
          Investigate test failure on auto_join22.q [Spark Branch] Sub-task Resolved Unassigned
          20.
          Investigate test failure on auto_join13.q [Spark Branch] Sub-task Resolved Unassigned
          21.
          Investigate test failures on auto_join6, auto_join7, auto_join18, auto_join18_multi_distinct [Spark Branch] Sub-task Resolved Chao Sun
          22.
          Investigate test failure on join34.q [Spark Branch] Sub-task Resolved Chao Sun
          23.
          Enable mapjoin hints [Spark Branch] Sub-task Resolved Chao Sun
          24.
          Enable non-staged mapjoin [Spark Branch] Sub-task Open Unassigned
          25.
          Investigate test failure on auto_join2.q [Spark Branch] Sub-task Resolved Chao Sun
          26.
          Investigate test failure for join_empty.q [Spark Branch] Sub-task Resolved Szehon Ho
          27.
          Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch] Sub-task Resolved Chao Sun
          28.
          Add SORT_QUERY_RESULTS for join tests that do not guarantee order Sub-task Resolved Chao Sun
          29.
          Investigate test failure on skewjoin.q [Spark Branch] Sub-task Resolved Chao Sun
          30.
          Fix memory limit check for combine nested mapjoins [Spark Branch] Sub-task Resolved Szehon Ho
          31.
          Enable Map Join [Spark Branch] Sub-task Resolved Chao Sun
          32.
          Add SORT_QUERY_RESULTS for join tests that do not guarantee order #2 Sub-task Resolved Chao Sun
          33.
          Investigate test failure on bucketmapjoin7.q [Spark Branch] Sub-task Resolved Jimmy Xiang
          34.
          Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Sub-task Resolved Chao Sun
          35.
          Investigate mapjoin_mapjoin.q failure [Spark Branch] Sub-task Resolved Unassigned
          36.
          IndexOutOfBounds exception in mapjoin [Spark Branch] Sub-task Resolved Chao Sun
          37.
          Not a directory error in mapjoin_hook.q [Spark Branch] Sub-task Resolved Chao Sun
          38.
          Fix bucket related test failure: parquet_join.q [Spark Branch] Sub-task Resolved Jimmy Xiang
          39.
          Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch] Sub-task Resolved Szehon Ho
          40.
          Union input to a join operator poses problem when converting to map join [Spark Branch] Sub-task Open wangwenli

          Activity

            People

              Unassigned Unassigned
              xuefuz Xuefu Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: