Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23869

Spark 2.3.0 left outer join not emitting null values instead waiting for the record in other stream

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 2.3.0
    • None
    • Spark Core
    • None

    Description

      Left outer join on two streams not emitting the null outputs. It is just waiting for the record to be added to other stream. Used socketstream to test this. In our case we want to emit the records with null values which doesn't match with id or/and not fall in time range condition

      Details of the watermarks and intervals are:

      val ds1Map = ds1
      .selectExpr("Id AS ds1_Id", "ds1_timestamp")
      .withWatermark("ds1_timestamp","10 seconds")

      val ds2Map = ds2
      .selectExpr("Id AS ds2_Id", "ds2_timestamp")
      .withWatermark("ds2_timestamp", "20 seconds")

      val output = ds1Map.join( ds2Map,
      expr(
      """ ds1_Id = ds2_Id AND ds2_timestamp >= ds1_timestamp AND  ds2_timestamp <= ds1_timestamp + interval 1 minutes """),
      "leftOuter")

      val query = output.select("*")
      .writeStream

      .outputMode(OutputMode.Append)
      .format("console")
      .option("checkpointLocation", "./spark-checkpoints/")
      .start()

      query.awaitTermination()

      Thank you. 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            abharath9 bharath kumar avusherla
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: