Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-8017

Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: Spark
    • Labels:

      Description

      HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use HiveKey.hashCode for more complicated ones, e.g. join, bucketed table, etc.

        Attachments

        1. HIVE-8017.2-spark.patch
          76 kB
          Rui Li
        2. HIVE-8017.3-spark.patch
          97 kB
          Rui Li
        3. HIVE-8017.4-spark.patch
          75 kB
          Rui Li
        4. HIVE-8017.5-spark.patch
          76 kB
          Rui Li
        5. HIVE-8017-spark.patch
          31 kB
          Rui Li

          Issue Links

            Activity

              People

              • Assignee:
                lirui Rui Li
                Reporter:
                lirui Rui Li
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: