Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4006

Add local sort operator

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Query Processor
    • None

    Description

      We've seen in the past that sorting data on a specific column can greatly improve the compression of data. The problem is that sorting data is expensive and requires a reduce phase.

      One way around this is to add a local sort (either as an operator or between serialization and output). This could take chunks of rows and do an in memory sort of these. This would be much faster, but would need to be very memory efficient in order to get the maximum number of rows in a chunk (and hence the maximum benefit).

      Attachments

        Issue Links

          Activity

            People

              namit Namit Jain
              namit Namit Jain
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: