As discussed in Beam Dev list, we should have a second runner for Spark based on the Dataset API.
As part of this the Spark runner will have three modules: runner-spark-core, runner-spark-rdd (Spark 1.6.x) and runner-spark-dataset (Spark 2.x).
This work should go in a feature branch (runner-spark2 already exists).
This ticket is about creating a skeleton for the structure mentioned, and everything that can be easily ported from the current runner.
Some of the work is already in the current feature branch, but a lot has changed since it was last updated.