Description
It is pretty common for customers to do regular extractions of update data from an external datasource (e.g. mysql or postgres). While this is possible today, the syntax is a little onerous. With some small improvements to the analyzer I think we could make this much easier.
Goal: Allow users to execute the following two queries as well as their dataframe equivalents
to find the most recent record for each key
SELECT max(struct(timestamp, *)) as mostRecentRecord GROUP BY key
to unnest the struct from above.
SELECT mostRecentRecord.* FROM data
Attachments
Issue Links
- is duplicated by
-
SPARK-11637 Alias do not work with udf with * parameter
- Resolved
- links to