CrawlDatum allows Jexl expressions on its metadata fields nicely, but it lacks the opportunity to select on attributes like fetchTime and modifiedTime.
This includes a rudimentary date parser only supporting the yyyy-MM-dd'T'HH:mm:ss'Z' format:
Dump everything with a modifiedTime higher than 2016-03-20T00:00:00Z
Dump everything that is an HTML file
Keep in mind:
- Jexl doesn't allow a hyphen/minus in field identifier, they are transformed to underscores
- string literals must be in quotes, only surrounding qoute needs to be escaped by backslash