Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-4786

Distinct has bad parallelism characteristics

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • sdk-java-core

    Description

      Distinct groups first and then drops extra elements. Should drop elements in the mappers, and later in reducers.

      https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Distinct.java#L100

      Attachments

        Activity

          People

            Unassigned Unassigned
            pabloem Pablo Estrada
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: