Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7526

Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • Spark
    • None

    Description

      Currently SparkClient shuffles data by calling paritionByKey(). This transformation outputs <key, value> tuples. However, Hive's ExecMapper expects <key, iterator<value>> tuples, and Spark's groupByKey() seems outputing this directly. Thus, using groupByKey, we may be able to avoid its own key clustering mechanism (in HiveReduceFunction). This research is to have a try.

      Attachments

        1. HIVE-7526.2.patch
          3 kB
          Chao Sun
        2. HIVE-7526.3.patch
          11 kB
          Chao Sun
        3. HIVE-7526.4-spark.patch
          11 kB
          Chao Sun
        4. HIVE-7526.5-spark.patch
          13 kB
          Xuefu Zhang
        5. HIVE-7526.patch
          4 kB
          Chao Sun

        Issue Links

          Activity

            People

              csun Chao Sun
              xuefuz Xuefu Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: