Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8699 Enable support for common map join [Spark Branch]
  3. HIVE-8810

Make HashTableSinkOperator works for Spark Branch [Spark Branch]

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • spark-branch
    • 1.1.0
    • Spark
    • None

    Description

      In MR, all small tables for a particular MJ operator share the same instance of HashTableSinkOperator, while in Spark branch, each small table corresponds to a different HashTableSinkOperator instance. This difference causes some issues.

      For instance, in HashTableSinkOperator#processOp, it uses a tag to look for information in various data structures, such as joinKeys, filterMaps, joinValues, etc. Those data structures stores the information BEFORE it splits the MJ operator with its parents. But, since later on we use separate HashTableSinkOperator for each small table, that information is no longer valid, and thus this method will fail.

      This JIRA is to track and solve these related issues.

      Attachments

        1. HIVE-8810.3-spark.patch
          20 kB
          Jimmy Xiang
        2. HIVE-8810.2-spark.patch
          24 kB
          Jimmy Xiang
        3. HIVE-8810.1-spark.patch
          33 kB
          Jimmy Xiang

        Activity

          People

            jxiang Jimmy Xiang
            csun Chao Sun
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: