Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-5690

Issue with GroupByKey in BeamSql using SparkRunner

Details

    • Task
    • Status: Triage Needed
    • P2
    • Resolution: Fixed
    • None
    • 2.19.0
    • runner-spark
    • None

    Description

      Reported on user@

      We are trying to setup a pipeline with using BeamSql and the trigger used is default (AfterWatermark crosses the window).
      Below is the pipeline:

      KafkaSource (KafkaIO)
      ---> Windowing (FixedWindow 1min)
      ---> BeamSql
      ---> KafkaSink (KafkaIO)

      We are using Spark Runner for this.
      The BeamSql query is:

      select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3

      We are grouping by Col3 which is a string. It can hold values string[0-9].

      The records are getting emitted out at 1 min to kafka sink, but the output record in kafka is not as expected.
      Below is the output observed: (WST and WET are indicators for window start time and window end time)

      {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0}
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kenn Kenneth Knowles
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4.5h
                  4.5h