Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-5690

Issue with GroupByKey in BeamSql using SparkRunner

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.19.0
    • Component/s: runner-spark
    • Labels:
      None

      Description

      Reported on user@

      We are trying to setup a pipeline with using BeamSql and the trigger used is default (AfterWatermark crosses the window).
      Below is the pipeline:

      KafkaSource (KafkaIO)
      ---> Windowing (FixedWindow 1min)
      ---> BeamSql
      ---> KafkaSink (KafkaIO)

      We are using Spark Runner for this.
      The BeamSql query is:

      select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3

      We are grouping by Col3 which is a string. It can hold values string[0-9].

      The records are getting emitted out at 1 min to kafka sink, but the output record in kafka is not as expected.
      Below is the output observed: (WST and WET are indicators for window start time and window end time)

      {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0000  +0000"}
      {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 0000  +0000","WET":"2018-10-09  09-56-00 0}
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                kenn Kenneth Knowles
              • Votes:
                1 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4.5h
                  4.5h