Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-840

Add Java SDK extension to support non-distributed sorting

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: extensions-java-sorter
    • Labels:

      Description

      Add an extension that provides a PTransform which performs local(non-distributed) sorting. It will sort in memory until the buffer is full, then flush to disk and use external sorting.

      Consumes a PCollection of KVs from primary key to iterable of secondary key and value KVs and sorts the iterables. Would probably be called after a GroupByKey. Uses coders to convert secondary keys and values into byte arrays and does a lexicographical comparison on the secondary keys.

      Uses Hadoop as an external sorting library.

        Attachments

          Activity

            People

            • Assignee:
              mshanklin Mitch Shanklin
              Reporter:
              mshanklin Mitch Shanklin

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment