Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • 2.0.0
    • SQL
    • None

    Description

      Sort merge join on two datasets on the file system that have already been partitioned the same with the same number of partitions and sorted within each partition, and we don't need to sort it again while join with the sorted/partitioned keys

      This functionality exists in

      • Hive (hive.optimize.bucketmapjoin.sortedmerge)
      • Pig (USING 'merge')
      • MapReduce (CompositeInputFormat)

      Attachments

        Issue Links

          Activity

            People

              cloud_fan Wenchen Fan
              chenghao Cheng Hao
              Votes:
              6 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: