VXQuery is an XQuery compiler and runtime being built to comply with version 1.0 of the XQuery spec at http://www.w3.org/TR/XQuery. The distinguishing characteristic of this runtime is that it is designed to evaluate queries on large amounts of XML data. VXQuery uses the Hyracks platform (http://code.google.com/p/hyracks), a parallel dataflow engine, to parallelize queries so they can run on a cluster of shared-nothing computers.
We plan to exploit three kinds of parallelism within the XQuery engine while evaluating a single query.
1. Independent parallelism: Parts of a query that are unrelated to each other can be evaluated in parallel.
2. Partitioned parallelism: The engine partitions data (both input data as well as intermediate data) and processes the partitions in parallel.
3. Pipelined parallelism: The runtime organizes the work done to evaluate a query as a sequence of workers. As and when work is completed on a piece of data, the results of that piece are handed to the next worker, while the first can process the next piece of data. This is similar to an assembly line used in manufacturing plants.
Hyracks provides a set of operators to be able to evaluate queries in parallel. However, it does not know anything about VXQuery or the XQuery language and datamodel.
Currently we have a parser and translator that converts XQuery into a logical form ready for evaluation.
The task that needs to be done as part of this project is to implement functions that will plug into Hyracks operators so we can evaluate XQuery queries.
|1.||Missing from "Functions on String Values" Section||Open||Unassigned|
|2.||Accessor Functions Missing||Open||Unassigned|
|5.||Functions Missing for "String Functions that Use Pattern Matching"||Open|
|7.||Missing function related to QNames||Open||Unassigned|
|8.||Missing functions on Nodes||Open||Unassigned|
|9.||Missing context functions||Open|
|10.||Missing sequence functions||Open|