Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3247

Sample.any memory constraint

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: P3
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.2.0
    • Component/s: sdk-java-core
    • Labels:

      Description

      Right now Sample.any converts the collection to an iterable view and take first n in a side input. This may require materializing the entire collection to disk and is potentially inefficient.
      https://github.com/apache/beam/blob/v2.1.0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java#L74

      It can be fixed by applying a truncating `DoFn` first, then a combine into `List<T>` which limits the list size, and finally flattening the list.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sinisa_lyh Neville Li
                Reporter:
                sinisa_lyh Neville Li
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: