Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8699 Enable support for common map join [Spark Branch]
  3. HIVE-8810

Make HashTableSinkOperator works for Spark Branch [Spark Branch]

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • spark-branch
    • 1.1.0
    • Spark
    • None

    Description

      In MR, all small tables for a particular MJ operator share the same instance of HashTableSinkOperator, while in Spark branch, each small table corresponds to a different HashTableSinkOperator instance. This difference causes some issues.

      For instance, in HashTableSinkOperator#processOp, it uses a tag to look for information in various data structures, such as joinKeys, filterMaps, joinValues, etc. Those data structures stores the information BEFORE it splits the MJ operator with its parents. But, since later on we use separate HashTableSinkOperator for each small table, that information is no longer valid, and thus this method will fail.

      This JIRA is to track and solve these related issues.

      Attachments

        1. HIVE-8810.1-spark.patch
          33 kB
          Jimmy Xiang
        2. HIVE-8810.2-spark.patch
          24 kB
          Jimmy Xiang
        3. HIVE-8810.3-spark.patch
          20 kB
          Jimmy Xiang

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jxiang Jimmy Xiang Assign to me
            csun Chao Sun
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment