Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-840

Add Java SDK extension to support non-distributed sorting

Details

    • New Feature
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • None
    • 0.4.0
    • extensions-java-sorter

    Description

      Add an extension that provides a PTransform which performs local(non-distributed) sorting. It will sort in memory until the buffer is full, then flush to disk and use external sorting.

      Consumes a PCollection of KVs from primary key to iterable of secondary key and value KVs and sorts the iterables. Would probably be called after a GroupByKey. Uses coders to convert secondary keys and values into byte arrays and does a lexicographical comparison on the secondary keys.

      Uses Hadoop as an external sorting library.

      Attachments

        Activity

          People

            mshanklin Mitch Shanklin
            mshanklin Mitch Shanklin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: