[SPARK-24382] Spark Structured Streaming aggregation on old timestamp data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Question
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: Structured Streaming
Labels:
- beginner
- bulk-closed

Description

I am trying to aggregate the count of records every 10 seconds using the structured streaming for the following incoming kafka data

{ 
"ts2" : "2018/05/01 00:02:50.041", 
"serviceGroupId" : "123", 
"userId" : "avv-0", 
"stream" : "", 
"lastUserActivity" : "00:02:50", 
"lastUserActivityCount" : "0" 
} 
{ 
"ts2" : "2018/05/01 00:09:02.079", 
"serviceGroupId" : "123", 
"userId" : "avv-0", 
"stream" : "", 
"lastUserActivity" : "00:09:02", 
"lastUserActivityCount" : "0" 
} 
{ 
"ts2" : "2018/05/01 00:09:02.086", 
"serviceGroupId" : "123", 
"userId" : "avv-2", 
"stream" : "", 
"lastUserActivity" : "00:09:02", 
"lastUserActivityCount" : "0" 
}

With the following logic

val sdvTuneInsAgg1 = df 
 .withWatermark("ts2", "10 seconds") 
 .groupBy(window(col("ts2"),"10 seconds")) 
 .agg(count("*") as "count") 
 .as[CountMetric1]

val query1 = sdvTuneInsAgg1.writeStream
.format("console")
.foreach(writer)
.start()

and I do not see any records inside the writer. But, the only anomaly is that the current date is 2018/05/24 but the record that I am processing (ts2) has old dates. Will aggregation / count work in this scenario ?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Karthik

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 24/May/18 15:29

Updated:: 08/Oct/19 05:43

Resolved:: 08/Oct/19 05:43