Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Blur should have the ability to have "scanner" plugins that, given a query, are handed all the matching records of the query. These would be async long running calls from the thrift api perspective.
The scanner would essentially be given a collector of the hits with the fields defined by the passed in selector.
The client would ask for a scan, then poll for the status periodically and - depending on the Scanner implementation - pick up the results in whatever form they were requested.
For a concrete implementation, think of export. The ExportScanner would be given a location in HDFS and scan over all the results and drop them in that directory - maybe in a particular requested form. The Scanner pattern could be have many useful implementations though - for example, to insert a subset of the data into a new Blur Table.
Here are some client API thoughts:
struct ScannerQuery { 1:Query query, 2:Selector selector, 3:string id, 4:string userContext, 5:string scannerName, 6:i64 startTime = 0, 7:map<string,string> properties } enum ScanStatus { COMPLETE, RUNNING, ERROR } void scan( 1:ScannerQuery scannerQuery ) throws (1:BlurException ex) list<string> scanList( ) throws (1:BlurException ex) ScanStatus statusScan( 1:string scanId ) throws (1:BlurException ex) void cancelScan( 1:string scanId ) throws (1:BlurException ex)