Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3247

Sample.any memory constraint

Details

    • Improvement
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • sdk-java-core

    Description

      Right now Sample.any converts the collection to an iterable view and take first n in a side input. This may require materializing the entire collection to disk and is potentially inefficient.
      https://github.com/apache/beam/blob/v2.1.0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java#L74

      It can be fixed by applying a truncating `DoFn` first, then a combine into `List<T>` which limits the list size, and finally flattening the list.

      Attachments

        Issue Links

          Activity

            People

              sinisa_lyh Neville Li
              sinisa_lyh Neville Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: