[BLUR-344] Expose a Scanner capability that allows various implementations (e.g. ExportScanner) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Blur
Labels:
None

Description

Blur should have the ability to have "scanner" plugins that, given a query, are handed all the matching records of the query. These would be async long running calls from the thrift api perspective.

The scanner would essentially be given a collector of the hits with the fields defined by the passed in selector.

The client would ask for a scan, then poll for the status periodically and - depending on the Scanner implementation - pick up the results in whatever form they were requested.

For a concrete implementation, think of export. The ExportScanner would be given a location in HDFS and scan over all the results and drop them in that directory - maybe in a particular requested form. The Scanner pattern could be have many useful implementations though - for example, to insert a subset of the data into a new Blur Table.

Here are some client API thoughts:

struct ScannerQuery {
  1:Query query,
  2:Selector selector,
  3:string id,
  4:string userContext,
  5:string scannerName,
  6:i64 startTime = 0,
  7:map<string,string> properties
}

enum ScanStatus {
  COMPLETE,
  RUNNING,
  ERROR
 }

  void scan(
    1:ScannerQuery scannerQuery
  ) throws (1:BlurException ex)

  list<string> scanList(
  ) throws (1:BlurException ex)

  ScanStatus statusScan(
    1:string scanId
  ) throws (1:BlurException ex)

  void cancelScan(
    1:string scanId
 ) throws (1:BlurException ex)

Attachments

Activity

People

Assignee:: Tim Williams

Reporter:: Tim Williams

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Jul/14 00:33

Updated:: 11/Aug/14 17:49